net.sf.okapi.common
Interface ISegmenter

All Known Implementing Classes:
SRXSegmenter

public interface ISegmenter

Common methods to provide segmentation facility to extracted content.


Method Summary
 int computeSegments(java.lang.String text)
          Calculate the segmentation of a given plain text string.
 int computeSegments(TextContainer container)
          Calculates the segmentation of a given TextContainer object.
 LocaleId getLanguage()
          Gets the language used to apply the rules.
 Range getNextSegmentRange(TextContainer container)
          Compute the range of the next segment for a given TextContainer object.
 java.util.List<Range> getRanges()
          Gets the list off all segments ranges calculated when calling computeSegments(String), or computeSegments(TextContainer).
 java.util.List<java.lang.Integer> getSplitPositions()
          Gets the list of all the split positions in the text that was last segmented.
 

Method Detail

computeSegments

int computeSegments(java.lang.String text)
Calculate the segmentation of a given plain text string.

Parameters:
text - plain text to segment.
Returns:
the number of segments calculated.

computeSegments

int computeSegments(TextContainer container)
Calculates the segmentation of a given TextContainer object. If the content is already segmented, it is un-segmented automatically before being processed.

Parameters:
container - the object to segment.
Returns:
the number of segments calculated.

getNextSegmentRange

Range getNextSegmentRange(TextContainer container)
Compute the range of the next segment for a given TextContainer object. The next segment is searched from the first character after the last segment marker found in the container.

Parameters:
container - the text container where to look for the next segment.
Returns:
a range corresponding to the start and end position of the found segment, or null if no more segments are found.

getSplitPositions

java.util.List<java.lang.Integer> getSplitPositions()
Gets the list of all the split positions in the text that was last segmented. You must call computeSegments(TextContainer) or computeSegments(String) before calling this method. A split position is the first character position of a new segment.

IMPORTANT: The position returned here are the position WITHOUT taking in account any options for trimming or not leading and trailing white-spaces.

Returns:
An array of integers where each value is a split position in the coded text that was segmented.

getRanges

java.util.List<Range> getRanges()
Gets the list off all segments ranges calculated when calling computeSegments(String), or computeSegments(TextContainer).

Returns:
the list of all segments ranges. each range is stored in a Range object where start is the start and end the end of the range. Returns null if no ranges have been defined yet.

getLanguage

LocaleId getLanguage()
Gets the language used to apply the rules.

Returns:
the language code used to apply the rules, or null, if none has been specified.