|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnet.sf.okapi.lib.segmentation.SRXDocument
public class SRXDocument
Provides facilities to load, save, and manage segmentation rules in SRX format. This class also implements several extensions to the standard SRX behavior.
| Field Summary | |
|---|---|
static java.lang.String |
ANYCODE
Marker for INLINECODE_PATTERN in the given pattern. |
static java.lang.String |
INLINECODE_PATTERN
Represents the pattern for an inline code (both special characters). |
static java.lang.String |
NOAUTO
Placed at the end of the 'after' expression, this marker indicates the given pattern should not have auto-insertion of AUTO_INLINECODES. |
| Constructor Summary | |
|---|---|
SRXDocument()
Creates an empty SRX document. |
|
| Method Summary | |
|---|---|
void |
addLanguageMap(LanguageMap langMap)
Adds a language map to this document. |
void |
addLanguageRule(java.lang.String name,
java.util.ArrayList<Rule> langRule)
Adds a language rule to this SRX document. |
boolean |
cascade()
Indicates if cascading must be applied when selecting the rules for a given language pattern. |
ISegmenter |
compileLanguageRules(LocaleId languageCode,
ISegmenter existingSegmenter)
Compiles the all language rules applicable for a given language code, and assign them to a segmenter. |
ISegmenter |
compileSingleLanguageRule(java.lang.String ruleName,
ISegmenter existingSegmenter)
Compiles a single language rule group and assign it to a segmenter. |
java.util.LinkedHashMap<java.lang.String,java.util.ArrayList<Rule>> |
getAllLanguageRules()
Gets a map of all the language rules in this document. |
java.util.ArrayList<LanguageMap> |
getAllLanguagesMaps()
Gets the list of all the language maps in this document. |
java.lang.String |
getComments()
Gets the comments associated with this document. |
java.lang.String |
getHeaderComments()
Gets the comments associated with the header of this document. |
java.util.ArrayList<Rule> |
getLanguageRules(java.lang.String ruleName)
Gets the list of rules for a given <languagerule> element. |
java.lang.String |
getMaskRule()
Gets the current pattern of the mask rule. |
java.lang.String |
getSampleLanguage()
Gets the current sample language code. |
java.lang.String |
getSampleText()
Gets the current sample text. |
java.lang.String |
getVersion()
Gets the version of this SRX document. |
java.lang.String |
getWarning()
Gets the last warning that was issued while loading a document. |
boolean |
hasWarning()
Indicates if a warning was issued last time a document was read. |
boolean |
includeEndCodes()
Indicates if end codes should be included (See SRX implementation notes). |
boolean |
includeIsolatedCodes()
Indicates if isolated codes should be included (See SRX implementation notes). |
boolean |
includeStartCodes()
Indicates if start codes should be included (See SRX implementation notes). |
boolean |
isModified()
Indicates if the document has been modified since the last load or save. |
void |
loadRules(java.lang.CharSequence data)
Loads an SRX document from a CharSequence object. |
void |
loadRules(java.io.InputStream inputStream)
Loads an SRX document from an input stream. |
void |
loadRules(java.lang.String pathOrURL)
Loads an SRX document from a file. |
boolean |
oneSegmentIncludesAll()
Indicates if, when there is a single segment in a text, it should include the whole text (no spaces or codes trim left/right) |
void |
resetAll()
Resets the document to its default empty initial state. |
void |
saveRules(java.lang.String rulesPath,
boolean saveExtensions,
boolean saveNonValidInfo)
Saves the current rules to an SRX rules document. |
java.lang.String |
saveRulesToString(boolean saveExtensions,
boolean saveNonValidInfo)
Saves the current rules to an SRX string. |
boolean |
segmentSubFlows()
Indicates if sub-flows must be segmented. |
void |
setCascade(boolean value)
Sets the flag indicating if cascading must be applied when selecting the rules for a given language pattern. |
void |
setComments(java.lang.String text)
Sets the comments for this document. |
void |
setHeaderComments(java.lang.String text)
Sets the comments for the header of this document. |
void |
setIncludeEndCodes(boolean value)
Sets the indicator that tells if end codes should be included or not. |
void |
setIncludeIsolatedCodes(boolean value)
Sets the indicator that tells if isolated codes should be included or not. |
void |
setIncludeStartCodes(boolean value)
Sets the indicator that tells if start codes should be included or not. |
void |
setMaskRule(java.lang.String pattern)
Sets the pattern for the mask rule. |
void |
setModified(boolean value)
Sets the flag indicating if the document has been modified since the last load or save. |
void |
setOneSegmentIncludesAll(boolean value)
Sets the indicator that tells if when there is a single segment in a text it should include the whole text (no spaces or codes trim left/right) text. |
void |
setSampleLanguage(java.lang.String value)
Sets the sample language code. |
void |
setSampleText(java.lang.String value)
Sets the sample text. |
void |
setSegmentSubFlows(boolean value)
Sets the flag indicating if sub-flows must be segmented. |
void |
setTestOnSelectedGroup(boolean value)
Sets the indicator on how to apply rules for samples. |
void |
setTrimLeadingWhitespaces(boolean value)
Sets the indicator that tells if leading white-spaces should be left outside the segments. |
void |
setTrimTrailingWhitespaces(boolean value)
Sets the indicator that tells if trailing white-spaces should be left outside the segments. |
boolean |
testOnSelectedGroup()
Indicates that, when sampling the rules, the sample should be computed using only a selected group of rules. |
boolean |
trimLeadingWhitespaces()
Indicates if leading white-spaces should be left outside the segments. |
boolean |
trimTrailingWhitespaces()
Indicates if trailing white-spaces should be left outside the segments. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String INLINECODE_PATTERN
public static final java.lang.String ANYCODE
public static final java.lang.String NOAUTO
| Constructor Detail |
|---|
public SRXDocument()
| Method Detail |
|---|
public java.lang.String getVersion()
public boolean hasWarning()
public java.lang.String getWarning()
public java.lang.String getHeaderComments()
public void setHeaderComments(java.lang.String text)
text - the new comments, use null or empty string for removing
the comments.public java.lang.String getComments()
public void setComments(java.lang.String text)
text - the new comments, use null or empty string for removing
the comments.public void resetAll()
public java.util.LinkedHashMap<java.lang.String,java.util.ArrayList<Rule>> getAllLanguageRules()
public java.util.ArrayList<Rule> getLanguageRules(java.lang.String ruleName)
ruleName - the name of the <languagerule> element to query.
public java.util.ArrayList<LanguageMap> getAllLanguagesMaps()
public boolean segmentSubFlows()
public void setSegmentSubFlows(boolean value)
value - true if sub-flows must be segmented, false otherwise.public boolean cascade()
public void setCascade(boolean value)
value - true if cascading must be applied, false otherwise.public boolean oneSegmentIncludesAll()
public void setOneSegmentIncludesAll(boolean value)
value - true if a text with a single segment should include the whole
text.public boolean trimLeadingWhitespaces()
public void setTrimLeadingWhitespaces(boolean value)
value - true if the leading white-spaces should be trimmed.public boolean trimTrailingWhitespaces()
public void setTrimTrailingWhitespaces(boolean value)
value - true if the trailing white-spaces should be trimmed.public boolean includeStartCodes()
public void setIncludeStartCodes(boolean value)
value - true if start codes should be included, false otherwise.public boolean includeEndCodes()
public void setIncludeEndCodes(boolean value)
value - true if end codes should be included, false otherwise.public boolean includeIsolatedCodes()
public void setIncludeIsolatedCodes(boolean value)
value - true if isolated codes should be included, false otherwise.public java.lang.String getMaskRule()
public void setMaskRule(java.lang.String pattern)
pattern - the new pattern to use for the mask rule.public java.lang.String getSampleText()
public void setSampleText(java.lang.String value)
value - the new sample text.public java.lang.String getSampleLanguage()
public void setSampleLanguage(java.lang.String value)
value - the new sample language code.public boolean testOnSelectedGroup()
public void setTestOnSelectedGroup(boolean value)
value - true to test using only a selected group of rules.
False to test using all the rules matching a given language.public boolean isModified()
public void setModified(boolean value)
value - true if the document has been changed, false otherwise.
public void addLanguageRule(java.lang.String name,
java.util.ArrayList<Rule> langRule)
name - name of the language rule to add.langRule - language rule object to add.public void addLanguageMap(LanguageMap langMap)
langMap - the language map object to add.
public ISegmenter compileLanguageRules(LocaleId languageCode,
ISegmenter existingSegmenter)
cascade() is true.
languageCode - the language code. the value should be a
BCP-47 value (e.g. "de", "fr-ca", etc.)existingSegmenter - optional existing SRXSegmenter object to re-use.
Use null for not re-using anything.
public ISegmenter compileSingleLanguageRule(java.lang.String ruleName,
ISegmenter existingSegmenter)
ruleName - the name of the rule group to apply.existingSegmenter - optional existing SRXSegmenter object to re-use.
Use null for not re-using anything.
public void loadRules(java.lang.CharSequence data)
data - the string containing the SRX document to load.public void loadRules(java.lang.String pathOrURL)
pathOrURL - The full path or URL of the document to load.public void loadRules(java.io.InputStream inputStream)
inputStream - the input stream to read from.
public java.lang.String saveRulesToString(boolean saveExtensions,
boolean saveNonValidInfo)
saveExtensions - true to save Okapi SRX extensions, false otherwise.saveNonValidInfo - true to save non-SRX-valid attributes, false otherwise.
public void saveRules(java.lang.String rulesPath,
boolean saveExtensions,
boolean saveNonValidInfo)
rulesPath - the full path of the file where to save the rules.saveExtensions - true to save Okapi SRX extensions, false otherwise.saveNonValidInfo - true to save non-SRX-valid attributes, false otherwise.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||