|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnet.sf.okapi.common.resource.TextFragment
public class TextFragment
Implements the methods for creating and manipulating a pre-parsed flat representation of a content with in-line codes.
The model uses two objects to store the data:
Code object.The coded text string is composed of normal characters and markers.
A marker is a sequence of two special characters (in the Unicode PUA)
that indicate the type of underlying code (opening, closing, isolated), and an index
pointing to its corresponding Code object where more information can be found.
The value of the index is encoded as a Unicode PUA character. You can use the
toChar(int) and toIndex(char) methods to encoded and decode
the index value.
To get the coded text of a TextFragment object use getCodedText(), and
to get its list of codes use getCodes().
You can modify directly the coded text or the codes and re-apply them to the
TextFragment object using setCodedText(String) and
setCodedText(String, List).
Adding a code to the coded text can be done by:
append(TagType, String, String)
changeToCode(int, int, TagType, String)
| Nested Class Summary | |
|---|---|
static class |
TextFragment.TagType
List of the types of tag usable for in-line codes. |
| Field Summary | |
|---|---|
static int |
CHARBASE
Special value used as the base of inline code indices. |
protected java.util.ArrayList<Code> |
codes
List of the inline codes for this fragment. |
protected boolean |
isBalanced
Flag indicating if the opening/closing inline codes of this fragment have been balanced or not. |
protected int |
lastCodeID
Value of the last inline code ID in this fragment. |
static int |
MARKER_CLOSING
Special character marker for a closing inline code. |
static int |
MARKER_ISOLATED
Special character marker for an isolated inline code. |
static int |
MARKER_OPENING
Special character marker for a opening inline code. |
static java.lang.String |
REFMARKER_END
Marker for end of reference. |
static java.lang.String |
REFMARKER_SEP
Marker for reference separator. |
static java.lang.String |
REFMARKER_START
Marker for start of reference. |
protected java.lang.StringBuilder |
text
Coded text buffer of this fragment. |
| Constructor Summary | |
|---|---|
TextFragment()
Creates an empty TextFragment. |
|
TextFragment(java.lang.String text)
Creates a TextFragment with a given text. |
|
TextFragment(java.lang.String text,
int lastCodeId)
Creates a TextFragment with a given text and an initial id value for codes. |
|
TextFragment(java.lang.String newCodedText,
java.util.List<Code> newCodes)
Creates a TextFragment with the content made of a given coded text and a list of codes. |
|
TextFragment(TextFragment fragment)
Creates a TextFragment with the content of a given TextFragment. |
|
| Method Summary | |
|---|---|
void |
alignCodeIds(TextFragment base)
Aligns the code IDs of this fragment with the ones of a given fragment. |
int |
annotate(int start,
int end,
java.lang.String type,
InlineAnnotation annotation)
Annotates a section of this text. |
java.lang.Appendable |
append(char value)
Appends a character to the fragment. |
java.lang.Appendable |
append(java.lang.CharSequence csq)
Appends the specified character sequence to this fragment. |
java.lang.Appendable |
append(java.lang.CharSequence csq,
int start,
int end)
Appends a subsequence of the specified character sequence to this fragment. |
Code |
append(Code code)
Appends an existing code to this fragment. |
void |
append(java.lang.String text)
Appends a string to the fragment. |
Code |
append(TextFragment.TagType tagType,
java.lang.String type,
InlineAnnotation annotation)
Appends an annotation-type code to this text. |
Code |
append(TextFragment.TagType tagType,
java.lang.String type,
java.lang.String data)
Appends a new code to the text. |
Code |
append(TextFragment.TagType tagType,
java.lang.String type,
java.lang.String data,
int id)
Appends a new code to the text, when the code has a defined identifier. |
TextFragment |
append(TextFragment fragment)
Appends a TextFragment object to this fragment. |
protected void |
balanceMarkers()
Balances the markers based on the tag type of the codes. |
int |
changeToCode(int start,
int end,
TextFragment.TagType tagType,
java.lang.String type)
Changes a section of the coded text into a single code. |
char |
charAt(int index)
Returns the character at the specified index in the coded text of this fragment. |
void |
clear()
Clears the fragment of all content. |
TextFragment |
clone()
Clones this TextFragment. |
void |
collapseWhitespace()
Collapse all whitespace to a single space character. |
int |
compareTo(java.lang.Object object)
Compares an object with this TextFragment. |
int |
compareTo(TextFragment frag,
boolean codeSensitive)
Compares a TextFragment with this one. |
boolean |
equals(java.lang.Object object)
|
java.util.List<AnnotatedSpan> |
getAnnotatedSpans(java.lang.String type)
Gets the list of all spans of text annotated with a given type of annotation. |
java.util.List<Code> |
getClonedCodes()
Gets a list of the copy of the codes for this fragment. |
Code |
getCode(char indexAsChar)
Gets the code for a given index formatted as character (the second special character in a marker in a coded text string). |
Code |
getCode(int index)
Gets the code for a given index. |
java.lang.String |
getCodedText()
Gets the coded text representation of the fragment. |
java.lang.String |
getCodedText(int start,
int end)
Gets the portion of coded text for a given section of the coded text. |
java.util.List<Code> |
getCodes()
Gets the list of all codes for the fragment. |
java.util.List<Code> |
getCodes(int start,
int end)
Gets a copy of the list of the codes that are within a given section of coded text. |
int |
getIndex(int id)
Gets the index value for the first in-line code (in the codes list) with a given identifier. |
int |
getIndexForClosing(int id)
Gets the index value for the closing in-line code (in the codes list) with a given identifier. |
int |
getLastCodeId()
Gets the last value used for code id. |
static java.lang.Object[] |
getRefMarker(java.lang.StringBuilder text)
Helper method to retrieve a reference marker from a string. |
java.lang.String |
getText()
Get the text of the fragment (all codes are removed) |
static java.lang.String |
getText(java.lang.String codedText)
Helper method that will take a coded string and return a text only version. |
boolean |
hasAnnotation()
Indicates if this text has at least one annotation. |
boolean |
hasAnnotation(java.lang.String type)
Indicates if this text has at least one annotation of a given type. |
boolean |
hasCode()
Indicates if the fragment contains at least one code. |
boolean |
hasReference()
Indicates if this TextFragment contains any in-line code with a reference. |
boolean |
hasText()
Indicates if this fragment contains at least one character other than a whitespace. |
boolean |
hasText(boolean whiteSpacesAreText)
Indicates if this fragment contains at least one character (inline codes, segment markers, and annotation markers do not count as characters). |
static int |
indexOfFirstNonWhitespace(java.lang.String codedText,
int fromIndex,
int untilIndex,
boolean openingMarkerIsWS,
boolean closingMarkerIsWS,
boolean isolatedMarkerIsWS,
boolean whitespaceIsWS)
Helper method to find the first non-whitespace character of a coded text, starting at a given position and no farther than another given position. |
static int |
indexOfLastNonWhitespace(java.lang.String codedText,
int fromIndex,
int untilIndex,
boolean openingMarkerIsWS,
boolean closingMarkerIsWS,
boolean isolatedMarkerIsWS,
boolean whitespaceIsWS)
Helper method to find, from the back, the first non-whitespace character of a coded text, starting at a given position and no farther than another given position. |
void |
insert(int offset,
TextFragment fragment)
Inserts a TextFragment object to this fragment. |
void |
invalidate()
Sets the fragment in a state where it has to be re-balanced before being used for output. |
boolean |
isEmpty()
Indicates if the fragment is empty (no text and no codes). |
static boolean |
isMarker(char ch)
Helper method that checks if a given character is an inline code marker. |
int |
length()
Returns the number of character in the coded text of this fragment. |
void |
ltrim()
Remove leading whitespace from this fragment |
static java.lang.String |
makeRefMarker(java.lang.String id)
Helper method to build a reference marker string from a given identifier. |
static java.lang.String |
makeRefMarker(java.lang.String id,
java.lang.String propertyName)
Helper method to build a reference marker string from a given identifier and a property name. |
void |
remove(int start,
int end)
Removes a section of the fragment (including its codes). |
void |
removeAnnotations()
Removes all annotations in this text. |
void |
removeAnnotations(java.lang.String type)
Removes all annotations of a given type in this text. |
void |
removeCode(Code code)
Remove the Code from this TextFragment |
void |
renumberCodes()
Renumbers the IDs of the codes in the fragment. |
void |
rtrim()
Remove trailing whitespace from this fragment |
void |
setCodedText(java.lang.String newCodedText)
Sets the coded text of the fragment, using its the existing codes. |
void |
setCodedText(java.lang.String newCodedText,
boolean allowCodeDeletion)
Sets the coded text of the fragment, using its the existing codes. |
void |
setCodedText(java.lang.String newCodedText,
java.util.List<Code> newCodes)
Sets the coded text of the fragment and its corresponding codes. |
void |
setCodedText(java.lang.String newCodedText,
java.util.List<Code> newCodes,
boolean allowCodeDeletion)
Sets the coded text of the fragment and its corresponding codes. |
TextFragment |
subSequence(int start,
int end)
Gets a copy of a sub-sequence of this object. |
static char |
toChar(int index)
Helper method to convert a marker index to its character value in the coded text string. |
static int |
toIndex(char index)
Helper method to convert the index-coded-as-character part of a marker into its index value. |
java.lang.String |
toString()
Gets the coded text for this fragment. |
java.lang.String |
toText()
Returns the content of this fragment, including the original codes whenever possible. |
void |
trim()
Trims white-spaces from the beginning and the end of this fragment. |
static void |
unwrap(TextFragment frag)
Unwraps the content of a TextFragment. |
| Methods inherited from class java.lang.Object |
|---|
finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final int MARKER_OPENING
public static final int MARKER_CLOSING
public static final int MARKER_ISOLATED
public static final int CHARBASE
public static final java.lang.String REFMARKER_START
public static final java.lang.String REFMARKER_END
public static final java.lang.String REFMARKER_SEP
protected java.lang.StringBuilder text
protected java.util.ArrayList<Code> codes
protected boolean isBalanced
protected int lastCodeID
| Constructor Detail |
|---|
public TextFragment()
public TextFragment(java.lang.String text)
text - the text to use.
public TextFragment(java.lang.String text,
int lastCodeId)
text - the text to use.lastCodeId - value to use to start the code id. The first new code will have for id this value+1.
The value should be -1 or a positive number. Values below -1 will be automatically reset to -1.public TextFragment(TextFragment fragment)
fragment - the content to use.
public TextFragment(java.lang.String newCodedText,
java.util.List<Code> newCodes)
newCodedText - the new coded text.newCodes - the list of codes.| Method Detail |
|---|
public static char toChar(int index)
index - the index value to encode.
public static int toIndex(char index)
index - the character to decode.
public static java.lang.String makeRefMarker(java.lang.String id)
id - the identifier to use.
public static java.lang.String makeRefMarker(java.lang.String id,
java.lang.String propertyName)
id - The identifier to use.propertyName - the name of the property to use.
public static java.lang.Object[] getRefMarker(java.lang.StringBuilder text)
text - the text to search for a reference marker.
public static int indexOfLastNonWhitespace(java.lang.String codedText,
int fromIndex,
int untilIndex,
boolean openingMarkerIsWS,
boolean closingMarkerIsWS,
boolean isolatedMarkerIsWS,
boolean whitespaceIsWS)
codedText - the coded text to process.fromIndex - the first position to check (must be greater or equal to
untilIndex). Use -1 to point to the last position of the text.untilIndex - The last position to check (must be lesser or equal to
fromIndex).openingMarkerIsWS - indicates if opening markers count as whitespace.closingMarkerIsWS - indicates if closing markers count as whitespace.isolatedMarkerIsWS - indicates if isolated markers count as whitespace.whitespaceIsWS - indicates if whitespace characters count as whitespace.
public static int indexOfFirstNonWhitespace(java.lang.String codedText,
int fromIndex,
int untilIndex,
boolean openingMarkerIsWS,
boolean closingMarkerIsWS,
boolean isolatedMarkerIsWS,
boolean whitespaceIsWS)
codedText - the coded text to process.fromIndex - the first position to check (must be lesser or equal to
untilIndex).untilIndex - the last position to check (must be greater or equal to
fromIndex). Use -1 to point to the last position of the text.openingMarkerIsWS - indicates if opening markers count as whitespace.closingMarkerIsWS - indicates if closing markers count as whitespace.isolatedMarkerIsWS - indicates if isolated markers count as whitespace.whitespaceIsWS - indicates if whitespace characters count as whitespace.
public static void unwrap(TextFragment frag)
frag - the text fragment to unwrap.public static boolean isMarker(char ch)
ch - the character to check.
public TextFragment clone()
clone in class java.lang.Objectpublic boolean hasReference()
public void append(java.lang.String text)
text - the string to append.public TextFragment append(TextFragment fragment)
fragment - the TextFragment to append.
public Code append(Code code)
code - the existing code to append.
public Code append(TextFragment.TagType tagType, java.lang.String type, InlineAnnotation annotation)
tagType - the tag type of the code (e.g. TagType.OPENING).type - the type of the annotation (e.g. "protected").annotation - the annotation to add (can be null).
public Code append(TextFragment.TagType tagType, java.lang.String type, java.lang.String data)
tagType - the tag type of the code (e.g. TagType.OPENING).type - the type of the code (e.g. "bold").data - the raw code itself. (e.g. "<b>").
public Code append(TextFragment.TagType tagType, java.lang.String type, java.lang.String data, int id)
tagType - the tag type of the code (e.g. TagType.OPENING).type - the type of the code (e.g. "bold").data - the raw code itself. (e.g. "<b>").id - the identifier to use for this code.
public void insert(int offset,
TextFragment fragment)
offset - position in the coded text where to insert the new fragment.
You can use -1 to append at the end of the current content.fragment - the TextFragment to insert.
InvalidPositionException - when offset points inside a marker.public void clear()
public void trim()
public void ltrim()
public void rtrim()
public void collapseWhitespace()
public java.lang.String getText()
public static java.lang.String getText(java.lang.String codedText)
codedText - string with possible TextFragment codes.
public java.lang.String getCodedText()
public java.lang.String getCodedText(int start,
int end)
start - the position of the first character or marker of the section
(in the coded text representation).end - The position just after the last character or marker of the section
(in the coded text representation).
You can use -1 for ending the section at the end of the fragment.
InvalidPositionException - when start or end points inside a marker.public Code getCode(char indexAsChar)
indexAsChar - the index value coded as character.
public Code getCode(int index)
index - the index of the code.
public java.util.List<Code> getCodes()
public java.util.List<Code> getClonedCodes()
public java.util.List<Code> getCodes(int start, int end)
start - the position of the first character or marker of the section
(in the coded text representation).end - the position just after the last character or marker of the section
(in the coded text representation).
InvalidPositionException - when start or end points inside a marker.public int getIndex(int id)
id - the identifier to look for.
public int getIndexForClosing(int id)
id - the identifier of the closing tag to look for.
public boolean isEmpty()
public boolean hasText()
public boolean hasText(boolean whiteSpacesAreText)
whiteSpacesAreText - indicates if whitespaces should be considered
characters or not for the purpose of checking if this fragment is empty.
public boolean hasCode()
public void remove(int start,
int end)
start - the position of the first character or marker of the section
(in the coded text representation).end - the position just after the last character or marker of the section
(in the coded text representation).
InvalidPositionException - when start or end points inside a marker.public TextFragment subSequence(int start, int end)
subSequence in interface java.lang.CharSequencestart - the position of the first character or marker of the section
(in the coded text representation).end - the position just after the last character or marker of the section
(in the coded text representation).
You can use -1 for ending the section at the end of the fragment.
public void setCodedText(java.lang.String newCodedText)
newCodedText - the coded text to apply.
InvalidContentException - when the coded text is not valid, or does
not correspond to the existing codes.
public void setCodedText(java.lang.String newCodedText,
boolean allowCodeDeletion)
newCodedText - The coded text to apply.allowCodeDeletion - True when missing in-line codes in the coded text
means the corresponding codes should be deleted from the fragment.
InvalidContentException - When the coded text is not valid, or does
not correspond to the existing codes.
public void setCodedText(java.lang.String newCodedText,
java.util.List<Code> newCodes)
newCodedText - the coded text to apply.newCodes - the list of the corresponding codes.
InvalidContentException - when the coded text is not valid or does
not correspond to the new codes.
public void setCodedText(java.lang.String newCodedText,
java.util.List<Code> newCodes,
boolean allowCodeDeletion)
newCodedText - the coded text to apply.newCodes - the list of the corresponding codes.allowCodeDeletion - True when missing in-line codes in the coded text
means the corresponding codes should be deleted from the fragment.
InvalidContentException - when the coded text is not valid or does
not correspond to the new codes.public java.lang.String toString()
getCodedText().
Each code is represented by a placeholder made of two special characters.
To get the content with the codes expanded as their original data use toText().
toString in interface java.lang.CharSequencetoString in class java.lang.Objectpublic java.lang.String toText()
getCodedText()
or toString().
public int compareTo(java.lang.Object object)
compareTo(fragment, false)
(Note that inline codes are not compared with this method).
If the object is not a TextFragment, the method returns the comparison between the two
toString() results of the two objects.
compareTo in interface java.lang.Comparable<java.lang.Object>object - the object to compare with this TextFragment.
public int compareTo(TextFragment frag, boolean codeSensitive)
frag - the TextFragment to compare with this one.codeSensitive - true if the codes need to be compared as well.
public boolean equals(java.lang.Object object)
equals in class java.lang.Object
public int changeToCode(int start,
int end,
TextFragment.TagType tagType,
java.lang.String type)
start - The position of the first character or marker of the section
(in the coded text representation).end - the position just after the last character or marker of the section
(in the coded text representation).tagType - the tag type of the new code.type - the type of the new code.
InvalidPositionException - when start or end points inside a marker.
public int annotate(int start,
int end,
java.lang.String type,
InlineAnnotation annotation)
start - the position of the first character or marker of the section
to annotate (in the coded text representation).end - the position just after the last character or marker of the section
to annotate (in the coded text representation).type - the type of annotation to set.annotation - the annotation to set (can be null).
InvalidPositionException - when start or end points inside a marker.public void removeAnnotations()
public void removeAnnotations(java.lang.String type)
type - the type of annotation to remove.public boolean hasAnnotation()
public boolean hasAnnotation(java.lang.String type)
type - the type of annotation to look for.
public java.util.List<AnnotatedSpan> getAnnotatedSpans(java.lang.String type)
type - the type of annotation to look for.
public void renumberCodes()
public void removeCode(Code code)
protected void balanceMarkers()
public void alignCodeIds(TextFragment base)
%d equals %s and the target is
%s equals %d and %s and %d are codes.
You want their IDs to match for the code with the same content.
base - the fragment to use as the base for the synchronization.public java.lang.Appendable append(char value)
append in interface java.lang.Appendablevalue - the character to append.
public java.lang.Appendable append(java.lang.CharSequence csq)
append in interface java.lang.Appendablecsq - the character sequence to append.
If the parameter is null, the string "null" is appended.
public java.lang.Appendable append(java.lang.CharSequence csq,
int start,
int end)
append in interface java.lang.Appendablecsq - the character sequence to append.
If csq is null, then characters will be appended as if csq contained the string "null".start - the index of the first character in the subsequence.end - the index of the character following the last character in the subsequence.
public char charAt(int index)
For example: If the fragment is "A[xy]B" and "[xy]" is a code, charAt(3) returns 'B' not 'x'.
If the specified index falls on a code placeholder, the character returned is either a marker
(first character of the placeholder) or a special index to access the underlying code (second
character of the placeholder). Markers can be identified using isMarker(char).
charAt in interface java.lang.CharSequenceindex - the index of the character to be returned.
java.lang.IndexOutOfBoundsException - if the if the index argument is negative or not less than the length
of the coded text.isMarker(char)public int length()
This is not the length of the content with all its codes. In the coded text, each code is represented by a placeholder made of two characters regardless of the size of the code. For example: If the fragment is "A[xy]B" and "[xy]" is a code, length() returns 4, not 6.
To get the length of the content including codes use .
Note that codes with referenced are not expanded by toText().length()toText().
length in interface java.lang.CharSequencepublic void invalidate()
public int getLastCodeId()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||