|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnet.sf.okapi.lib.segmentation.SRXSegmenter
public class SRXSegmenter
Implements the ISegmenter interface for SRX rules.
| Constructor Summary | |
|---|---|
SRXSegmenter()
Creates a new SRXSegmenter object. |
|
| Method Summary | |
|---|---|
protected void |
addRule(net.sf.okapi.lib.segmentation.CompiledRule compiledRule)
Adds a compiled rule to this segmenter. |
boolean |
cascade()
Indicates if cascading must be applied when selecting the rules for a given language pattern. |
int |
computeSegments(java.lang.String text)
Calculate the segmentation of a given plain text string. |
int |
computeSegments(TextContainer container)
Calculates the segmentation of a given TextContainer object. |
LocaleId |
getLanguage()
Gets the language used to apply the rules. |
Range |
getNextSegmentRange(TextContainer container)
Compute the range of the next segment for a given TextContainer object. |
java.util.List<Range> |
getRanges()
Gets the list off all segments ranges calculated when calling ISegmenter.computeSegments(String), or
ISegmenter.computeSegments(TextContainer). |
java.util.List<java.lang.Integer> |
getSplitPositions()
Gets the list of all the split positions in the text that was last segmented. |
boolean |
includeEndCodes()
Indicates if end codes should be included (See SRX implementation notes). |
boolean |
includeIsolatedCodes()
Indicates if isolated codes should be included (See SRX implementation notes). |
boolean |
includeStartCodes()
Indicates if start codes should be included (See SRX implementation notes). |
boolean |
oneSegmentIncludesAll()
Indicates if, when there is a single segment in a text, it should include the whole text (no spaces or codes trim left/right) |
void |
reset()
Resets the options to their defaults, and the compiled rules to nothing. |
boolean |
segmentSubFlows()
Indicates if sub-flows must be segmented. |
protected void |
setCascade(boolean value)
Sets the flag indicating if cascading must be applied when selecting the rules for a given language pattern. |
protected void |
setLanguage(LocaleId languageCode)
Sets the language used to apply the rules. |
protected void |
setMaskRule(java.lang.String pattern)
Sets the pattern for the mask rule. |
void |
setOptions(boolean segmentSubFlows,
boolean includeStartCodes,
boolean includeEndCodes,
boolean includeIsolatedCodes,
boolean oneSegmentIncludesAll,
boolean trimLeadingWS,
boolean trimTrailingWS)
Sets the options for this segmenter. |
boolean |
trimLeadingWhitespaces()
Indicates if leading white-spaces should be left outside the segments. |
boolean |
trimTrailingWhitespaces()
Indicates if trailing white-spaces should be left outside the segments. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public SRXSegmenter()
| Method Detail |
|---|
public void reset()
public void setOptions(boolean segmentSubFlows,
boolean includeStartCodes,
boolean includeEndCodes,
boolean includeIsolatedCodes,
boolean oneSegmentIncludesAll,
boolean trimLeadingWS,
boolean trimTrailingWS)
segmentSubFlows - true to segment sub-flows, false to no segment them.includeStartCodes - true to include start codes just before a break in the 'left' segment,
false to put them in the next segment.includeEndCodes - true to include end codes just before a break in the 'left' segment,
false to put them in the next segment.includeIsolatedCodes - true to include isolated codes just before a break in the 'left' segment,
false to put them in the next segment.oneSegmentIncludesAll - true to include everything in segments that are alone.trimLeadingWS - true to trim leading white-spaces from the segments, false to keep them.trimTrailingWS - true to trim trailing white-spaces from the segments, false to keep them.public boolean oneSegmentIncludesAll()
public boolean segmentSubFlows()
public boolean cascade()
public boolean trimLeadingWhitespaces()
public boolean trimTrailingWhitespaces()
public boolean includeStartCodes()
public boolean includeEndCodes()
public boolean includeIsolatedCodes()
public int computeSegments(java.lang.String text)
ISegmenter
computeSegments in interface ISegmentertext - plain text to segment.
public int computeSegments(TextContainer container)
ISegmenter
computeSegments in interface ISegmentercontainer - the object to segment.
public Range getNextSegmentRange(TextContainer container)
ISegmenter
getNextSegmentRange in interface ISegmentercontainer - the text container where to look for the next segment.
public java.util.List<java.lang.Integer> getSplitPositions()
ISegmenterISegmenter.computeSegments(TextContainer)
or ISegmenter.computeSegments(String) before calling this method.
A split position is the first character position of a new segment.
IMPORTANT: The position returned here are the position WITHOUT taking in account any options for trimming or not leading and trailing white-spaces.
getSplitPositions in interface ISegmenterpublic java.util.List<Range> getRanges()
ISegmenterISegmenter.computeSegments(String), or
ISegmenter.computeSegments(TextContainer).
getRanges in interface ISegmenterRange object where start is the start and end the end of the range.
Returns null if no ranges have been defined yet.public LocaleId getLanguage()
ISegmenter
getLanguage in interface ISegmenterprotected void setLanguage(LocaleId languageCode)
languageCode - Code of the language to use to apply the rules.protected void setCascade(boolean value)
value - true if cascading must be applied, false otherwise.protected void addRule(net.sf.okapi.lib.segmentation.CompiledRule compiledRule)
compiledRule - the compiled rule to add.protected void setMaskRule(java.lang.String pattern)
pattern - the new pattern to use for the mask rule.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||