net.sf.okapi.common.resource
Class TextContainer

java.lang.Object
  extended by net.sf.okapi.common.resource.TextContainer
All Implemented Interfaces:
java.lang.Iterable<TextPart>

public class TextContainer
extends java.lang.Object
implements java.lang.Iterable<TextPart>

Provides methods for storing the content of a paragraph-type unit, to handling its properties, annotations and segmentation.

The TextContainer is made of a collection of parts: Some are simple TextPart objects, others are special TextPart objects called Segment.

A TextContainer has always at least one Segment part.


Constructor Summary
TextContainer()
          Creates a new empty TextContainer object.
TextContainer(Segment segment)
          Creates a new TextContainer object with an initial segment.
TextContainer(java.lang.String text)
          Creates a new TextContainer object with some initial text.
TextContainer(TextFragment fragment)
          Creates a new TextContainer object with an initial TextFragment.
TextContainer(TextPart... parts)
          Creates a new TextContainer object with initial TextParts (segment or non-segment) appended.
 
Method Summary
 void append(java.lang.String text)
          Appends a part with a given text at the end of this container.
 void append(java.lang.String text, boolean collapseIfPreviousEmpty)
          Appends a part with a given text at the end of this container.
 void append(TextFragment fragment)
          Appends a part at the end of this container.
 void append(TextFragment fragment, boolean collapseIfPreviousEmpty)
          Appends a part at the end of this container.
 void append(TextPart part)
          Appends a TextPart (segment or non-segment) at the end of this container.
 void append(TextPart part, boolean collapseIfPreviousEmpty)
          Appends a TextPart (segment or non-segment) at the end of this container.
 void changePart(int partIndex)
          Changes the type of a given part.
 void clear()
          Clears this TextContainer, removes any existing segments.
 TextContainer clone()
          Clones this TextContainer, including the properties.
 TextContainer clone(boolean cloneProperties)
          Clones this container, with or without its properties.
 int compareTo(TextContainer cont, boolean codeSensitive)
          Compares this container with another one.
 boolean contentIsOneSegment()
          Indicates if this container is made of a single segment that holds the whole content (i.e.
static java.lang.String[] contentToSplitStorage(TextContainer tc)
          Create two storage strings to serialize a given TextContainer.
static java.lang.String contentToString(TextContainer tc)
          Creates a string that stores the content of a given container.
 int count()
          Gets the number of parts (segments and non-segments) in this container.
 TextPart get(int index)
          Gets the part (segment or non-segment) for a given part index.
<A extends IAnnotation>
A
getAnnotation(java.lang.Class<A> type)
           
 Annotations getAnnotations()
           
 java.lang.String getCodedText()
          Gets the coded text of the whole content (segmented or not).
 TextFragment getFirstContent()
          Gets the content of the first part (segment or non-segment) of this container.
 Segment getFirstSegment()
          Returns the first Segment of this container.
 TextFragment getLastContent()
          Gets the content of the last part (segment or non-segment) of this container.
 Property getProperty(java.lang.String name)
           
 java.util.Set<java.lang.String> getPropertyNames()
           
 ISegments getSegments()
          Creates a new ISegments object to access the segments of this container.
 TextFragment getUnSegmentedContentCopy()
          Gets a new TextFragment representing the un-segmented content of this container.
 boolean hasBeenSegmented()
          Indicates if a segmentation has been applied to this container.
 boolean hasProperty(java.lang.String name)
           
 boolean hasText()
          Indicates if this container contains at least one character that is not a whitespace.
 boolean hasText(boolean whiteSpacesAreText)
          Indicates if this container contains at least one character that is not a whitespace.
 boolean hasText(boolean lookInSegments, boolean whiteSpacesAreText)
          Indicates if this container contains at least one character.
 void insert(int partIndex, TextPart part)
          Inserts a given part (segment or non-segment) at a given position.
 boolean isEmpty()
          Indicates if this container is empty (no text and no codes).
 java.util.Iterator<TextPart> iterator()
          Creates an iterator to loop through the parts (segments and non-segments) of this container.
 void joinAll()
          Merges back together all parts (segments and non-segments) of this container, and clear the list of segments.
 int joinWithNext(int partIndex, int partCount)
          Joins a given part with a specified number of its following parts.
 void remove(int partIndex)
          Removes the part at s given position.
 void removeProperty(java.lang.String name)
           
 void setAnnotation(IAnnotation annotation)
           
 void setContent(TextFragment content)
          Sets the content of this TextContainer.
 void setHasBeenSegmentedFlag(boolean hasBeenSegmented)
          Sets the flag indicating if the content of this container has been segmented.
 Property setProperty(Property property)
           
 void split(int partIndex, int start, int end, boolean spannedPartIsSegment)
          Splits a given part into two or three parts.
static TextContainer splitStorageToContent(java.lang.String ctext, java.lang.String codes)
          Creates a new TextContainer object from two strings generated with contentToSplitStorage(TextContainer).
static TextContainer stringToContent(java.lang.String data)
          Converts a string created by contentToString(TextContainer) back into a TextContainer.
 java.lang.String toString()
          Gets the string representation of this container.
 void unwrap(boolean trimEnds, boolean collapseMode)
          Unwraps the content of this container.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TextContainer

public TextContainer()
Creates a new empty TextContainer object.


TextContainer

public TextContainer(java.lang.String text)
Creates a new TextContainer object with some initial text.

Parameters:
text - the initial text.

TextContainer

public TextContainer(TextFragment fragment)
Creates a new TextContainer object with an initial TextFragment.

Parameters:
fragment - the initial TextFragment.

TextContainer

public TextContainer(TextPart... parts)
Creates a new TextContainer object with initial TextParts (segment or non-segment) appended.

Parameters:
parts - the given initial parts.

TextContainer

public TextContainer(Segment segment)
Creates a new TextContainer object with an initial segment. If the id of the segment is null it will be set automatically.

Parameters:
segment - the initial segment.
Method Detail

getSegments

public ISegments getSegments()
Creates a new ISegments object to access the segments of this container.

Returns:
a new ISegments object.

contentToString

public static java.lang.String contentToString(TextContainer tc)
Creates a string that stores the content of a given container. Use stringToContent(String) to create the container back from the string.

IMPORTANT: Only the content is saved (not the properties, annotations, etc.).

Parameters:
tc - the container holding the content to store.
Returns:
a string representing the content of the given container.

stringToContent

public static TextContainer stringToContent(java.lang.String data)
Converts a string created by contentToString(TextContainer) back into a TextContainer.

Parameters:
data - the string to process.
Returns:
a new TextConatiner with the stored content re-created.

contentToSplitStorage

public static java.lang.String[] contentToSplitStorage(TextContainer tc)
Create two storage strings to serialize a given TextContainer. Use splitStorageToContent(String, String) to create the container back from the strings.

IMPORTANT: Only the content is saved (not the properties, annotations, etc.).

Parameters:
tc - the text container to store.
Returns:
An array of two String objects: The first one contains the coded text parts, the second one contains the codes.
See Also:
splitStorageToContent(String, String)

splitStorageToContent

public static TextContainer splitStorageToContent(java.lang.String ctext,
                                                  java.lang.String codes)
Creates a new TextContainer object from two strings generated with contentToSplitStorage(TextContainer).

Parameters:
ctext - the string holding the coded text parts.
codes - the string holding the codes.
Returns:
a new TextContainer object created from the strings.
See Also:
contentToSplitStorage(TextContainer)

toString

public java.lang.String toString()
Gets the string representation of this container. If the container is segmented, the representation shows the merged segments. Inline codes are also included.

Overrides:
toString in class java.lang.Object
Returns:
the string representation of this container.

iterator

public java.util.Iterator<TextPart> iterator()
Creates an iterator to loop through the parts (segments and non-segments) of this container.

Specified by:
iterator in interface java.lang.Iterable<TextPart>
Returns:
a new iterator all for the parts of this container.

compareTo

public int compareTo(TextContainer cont,
                     boolean codeSensitive)
Compares this container with another one. Note: This is a costly operation if the two containers have segments and no text differences.

Parameters:
cont - the other container to compare this one with.
codeSensitive - true if the codes need to be compared as well.
Returns:
a value 0 if the objects are equals.

hasBeenSegmented

public boolean hasBeenSegmented()
Indicates if a segmentation has been applied to this container. Note that it does not mean there is more than one segment or one part. Use contentIsOneSegment() to check if the container counts only one segment (whether is is the result of a segmentation or simply the default single segment).

This method return true if any method that may cause the content to be segmented has been called, and no operation has resulted in un-segmenting the content since that call, or if the content has more than one part.

Returns:
true if a segmentation has been applied to this container.
See Also:
setHasBeenSegmentedFlag(boolean)

setHasBeenSegmentedFlag

public void setHasBeenSegmentedFlag(boolean hasBeenSegmented)
Sets the flag indicating if the content of this container has been segmented.

Parameters:
hasBeenSegmented - true to flag the content has having been segmented, false to set it has not having been segmented.
See Also:
hasBeenSegmented()

contentIsOneSegment

public boolean contentIsOneSegment()
Indicates if this container is made of a single segment that holds the whole content (i.e. there is no other parts).

When this method returns true, the methods getFirstContent(), ISegments.getFirstContent(), getLastContent() and ISegments.getLastContent() return the same result.

Returns:
true if the whole content of this container is in a single segment.
See Also:
count(), ISegments.count()

changePart

public void changePart(int partIndex)
Changes the type of a given part. If the part was a segment this makes it a non-segment (except if this is the only part in the content. In that case the part remains unchanged). If this part was not a segment this makes it a segment (with its identifier automatically set).

Parameters:
partIndex - the index of the part to change. Note that even if the part is a segment this index must be the part index not the segment index.

insert

public void insert(int partIndex,
                   TextPart part)
Inserts a given part (segment or non-segment) at a given position. If the position is already occupied that part and all the parts to it right are shifted to the right.

If the part to insert is a segment, its id is validated.

Parameters:
partIndex - the position where to insert the new part.
part - the part to insert.

remove

public void remove(int partIndex)
Removes the part at s given position.

If the selected part is the last segment in the content, the part is only cleared, not removed.

Parameters:
partIndex - the position of the part to remove.

append

public void append(TextFragment fragment,
                   boolean collapseIfPreviousEmpty)
Appends a part at the end of this container.

If collapseIfPreviousEmpty and if the current last part (segment or non-segment) is empty, the text fragment is appended to the last part. Otherwise the text fragment is appended to the content as a new non-segment part.

Important: If the container is empty, the appended part becomes a segment, as the container has always at least one segment.

Parameters:
fragment - the text fragment to append.
collapseIfPreviousEmpty - true to collapse the previous part if it is empty.

append

public void append(TextFragment fragment)
Appends a part at the end of this container.

This call is the same as calling append(TextFragment, boolean) with collapseIfPreviousEmpty set to true.

Parameters:
fragment - the text fragment to append.

append

public void append(java.lang.String text,
                   boolean collapseIfPreviousEmpty)
Appends a part with a given text at the end of this container.

If collapseIfPreviousEmpty is true and if the current last part (segment or non-segment) is empty, the new text is appended to the last part part. Otherwise the text is appended to the content as a new non-segment part.

Parameters:
text - the text to append.
collapseIfPreviousEmpty - true to collapse the previous part if it is empty.

append

public void append(java.lang.String text)
Appends a part with a given text at the end of this container.

This call is the same as calling append(String, boolean) with collapseIfPreviousEmpty set to true.

Parameters:
text - the text to append.

append

public void append(TextPart part,
                   boolean collapseIfPreviousEmpty)
Appends a TextPart (segment or non-segment) at the end of this container.

If collapseiIfPreviousEmpty is true and if the current last part (segment or non-segment) is empty, the new part replaces the last part. Otherwise the part is appended to the container as it. If the result of the operation would result in a container without segment, the first part is automatically converted to a fragment.

Parameters:
part - the TextPart to append.
collapseIfPreviousEmpty - true to collapse the previous part if it is empty.

append

public void append(TextPart part)
Appends a TextPart (segment or non-segment) at the end of this container.

This call is the same as calling append(TextPart, boolean) with collapseIfPreviousEmpty set to true.

Parameters:
part - the TextPart to append.

getCodedText

public java.lang.String getCodedText()
Gets the coded text of the whole content (segmented or not). Use this method to compute segment boundaries that will be applied using ISegments.create(int, int) or ISegments.create(List) or other methods.

Returns:
the coded text of the whole content to use for segmentation template.
See Also:
ISegments.create(int, int), ISegments.create(List)

split

public void split(int partIndex,
                  int start,
                  int end,
                  boolean spannedPartIsSegment)
Splits a given part into two or three parts.

Parameters:
partIndex - index of the part to split.
start - start of the middle part to create.
end - position just after the last character of the middle part to create.
spannedPartIsSegment - true if the new middle part should be a segment, false if it should be a non-segment.

unwrap

public void unwrap(boolean trimEnds,
                   boolean collapseMode)
Unwraps the content of this container.

This method replaces any sequences of white-spaces by a single space character. It also removes leading and trailing white-spaces if the parameter trimEnds is set to true.

White spaces in this context are #x9, #xA and #x20. #xD is not considered a whitespace as the content of a text container must have its line-breaks normalized to #xA.

If the container has more than one segment and if collapseMode mode is set: non-segments parts are normalized and removed if they end up empty. If the option is not set: the method preserve at least one space between segments, even if the segments are empty.

Empty segments are always left.

Parameters:
trimEnds - true to remove leading and trailing white-spaces.

getFirstContent

public TextFragment getFirstContent()
Gets the content of the first part (segment or non-segment) of this container.

This method always returns the same result as ISegments.getFirstContent() if contentIsOneSegment() is true.

Returns:
the content of the first part (segment or non-segment) of this container.
See Also:
ISegments.getFirstContent(), getLastContent(), ISegments.getLastContent()

getFirstSegment

public Segment getFirstSegment()
Returns the first Segment of this container.

Returns:
the first Segment of this container or null if there is no Segment

getLastContent

public TextFragment getLastContent()
Gets the content of the last part (segment or non-segment) of this container.

This method always returns the same result as ISegments.getLastContent() if contentIsOneSegment().

Returns:
the content of the last part (segment or non-segment) of this container.
See Also:
ISegments.getLastContent(), getFirstContent(), ISegments.getFirstContent()

clone

public TextContainer clone()
Clones this TextContainer, including the properties.

Overrides:
clone in class java.lang.Object
Returns:
A new TextContainer object that is a copy of this one.

clone

public TextContainer clone(boolean cloneProperties)
Clones this container, with or without its properties.

Parameters:
cloneProperties - indicates if the properties should be cloned.
Returns:
A new TextContainer object that is a copy of this one.

getUnSegmentedContentCopy

public TextFragment getUnSegmentedContentCopy()
Gets a new TextFragment representing the un-segmented content of this container.

Important: This is an expensive method.

Returns:
an un-segmented copy of the content of this container.

setContent

public void setContent(TextFragment content)
Sets the content of this TextContainer. Any existing segmentation is removed. The content becomes a single segment content.

Parameters:
content - the new content to set.

clear

public void clear()
Clears this TextContainer, removes any existing segments. The content becomes a single empty segment content. Keeps annotations.


hasText

public boolean hasText(boolean lookInSegments,
                       boolean whiteSpacesAreText)
Indicates if this container contains at least one character. Inline codes and annotation markers do not count as characters.

Parameters:
lookInSegments - indicates if the possible segments in this containers should be looked at. If this parameter is set to false, the segment marker are treated as codes.
whiteSpacesAreText - indicates if whitespaces should be considered text characters or not.
Returns:
true if this container contains at least one character according the given options.

hasText

public boolean hasText(boolean whiteSpacesAreText)
Indicates if this container contains at least one character that is not a whitespace. All parts (segments and non-segments) are checked.

Parameters:
whiteSpacesAreText - indicates if whitespaces should be considered text characters or not.
Returns:
true if this container contains at least one character that is not a whitespace.

hasText

public boolean hasText()
Indicates if this container contains at least one character that is not a whitespace. This method has the same result as calling hasText(boolean, boolean) with the parameters true and false.

Returns:
true if this container contains at least one character that is not a whitespace.

isEmpty

public boolean isEmpty()
Indicates if this container is empty (no text and no codes).

Returns:
true if this container is empty.

hasProperty

public boolean hasProperty(java.lang.String name)

getProperty

public Property getProperty(java.lang.String name)

setProperty

public Property setProperty(Property property)

removeProperty

public void removeProperty(java.lang.String name)

getPropertyNames

public java.util.Set<java.lang.String> getPropertyNames()

getAnnotations

public Annotations getAnnotations()

getAnnotation

public <A extends IAnnotation> A getAnnotation(java.lang.Class<A> type)

setAnnotation

public void setAnnotation(IAnnotation annotation)

get

public TextPart get(int index)
Gets the part (segment or non-segment) for a given part index.

Parameters:
index - the index of the part to retrieve. the first part has the index 0, the second has the index 1, etc.
Returns:
the part (segment or non-segment) for the given index.
Throws:
java.lang.IndexOutOfBoundsException - if the index is out of bounds.
See Also:
ISegments.get(int)

count

public int count()
Gets the number of parts (segments and non-segments) in this container. This method always returns at least 1.

Returns:
the number of parts (segments and non-segments) in this container.
See Also:
ISegments.count()

joinAll

public void joinAll()
Merges back together all parts (segments and non-segments) of this container, and clear the list of segments. The content becomes a single segment content.


joinWithNext

public int joinWithNext(int partIndex,
                        int partCount)
Joins a given part with a specified number of its following parts.

If the resulting part is the only part in the container and is not a segment, it is set automatically changed into a segment.

joinWithNext(0, -1) has the same effect as joinAll();

Parameters:
partIndex - the index of the part where to append the following parts.
partCount - the number of parts to join. You can use -1 to indicate all the parts after the initial one.
Returns:
the number of parts joined to the given part (and removed from the list of parts).