net.sf.okapi.common
Class StringUtil

java.lang.Object
  extended by net.sf.okapi.common.StringUtil

public final class StringUtil
extends java.lang.Object

Helper methods to manipulate strings.


Constructor Summary
StringUtil()
           
 
Method Summary
static java.lang.String charsToString(java.util.Set<java.lang.Character> set)
           
static boolean containsWildcards(java.lang.String string)
          Detects if a given string contains shell wildcard characters (e.g.
static int getNumOccurrences(java.lang.String str, java.lang.String substr)
          Returns a number of occurrences of a given substring in a given string.
static java.lang.String getString(int length, char c)
          Returns a new string padded with a given character repeated given times.
static boolean isWhitespace(java.lang.String str)
          Checks if a given string contains only whitespace characters.
static float LcsEditDistance(java.lang.CharSequence seq1, java.lang.CharSequence seq2)
          Longest Common Subsequence algorithm on CharSequences.
static boolean matchesWildcard(java.lang.String string, java.lang.String pattern)
          Detects if a given string matches a given pattern (not necessarily a regex), possibly containing wildcards
static boolean matchesWildcard(java.lang.String string, java.lang.String pattern, boolean filenameMode)
          Detects if a given string matches a given pattern (not necessarily a regex), possibly containing wildcards
static java.lang.String mirrorString(java.lang.String str)
          Returns the reversed version of a given string, e.g.
static java.lang.String normalizeLineBreaks(java.lang.String string)
          Converts line breaks in a given string to the Unix standard (\n).
static java.lang.String normalizeWildcards(java.lang.String string)
          Converts shell wildcards (e.g.
static java.lang.String padString(java.lang.String string, int startPos, int endPos, char padder)
          Pads a range of a given string with a given character.
static java.lang.String removeQualifiers(java.lang.String st)
          Removes quotation marks around text in a given string.
static java.lang.String removeQualifiers(java.lang.String st, java.lang.String qualifier)
          Removes qualifiers (quotation marks etc.) around text in a given string.
static java.lang.String removeQualifiers(java.lang.String st, java.lang.String startQualifier, java.lang.String endQualifier)
          Removes qualifiers (quotation marks etc.) around text in a given string.
static java.lang.String[] split(java.lang.String string, java.lang.String delimRegex)
           
static java.lang.String[] split(java.lang.String string, java.lang.String delimRegex, int group)
           
static java.lang.String substring(java.lang.String string, int start, int end)
           
static java.lang.String titleCase(java.lang.String st)
          Returns a title-case representation of a given string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StringUtil

public StringUtil()
Method Detail

LcsEditDistance

public static float LcsEditDistance(java.lang.CharSequence seq1,
                                    java.lang.CharSequence seq2)
Longest Common Subsequence algorithm on CharSequences.

Parameters:
seq1 - CharSequence one
seq2 - CharSequence two
Returns:
the score based on the length of the common subsequence and the input sequences

titleCase

public static java.lang.String titleCase(java.lang.String st)
Returns a title-case representation of a given string. The first character is capitalized, following characters are in lower case.

Parameters:
st - the give string.
Returns:
a copy of the given string normalized to the title case.

removeQualifiers

public static java.lang.String removeQualifiers(java.lang.String st,
                                                java.lang.String startQualifier,
                                                java.lang.String endQualifier)
Removes qualifiers (quotation marks etc.) around text in a given string.

Parameters:
st - the given string.
startQualifier - the qualifier to be removed before the given string.
endQualifier - the qualifier to be removed after the given string.
Returns:
a copy of the given string without qualifiers.

removeQualifiers

public static java.lang.String removeQualifiers(java.lang.String st,
                                                java.lang.String qualifier)
Removes qualifiers (quotation marks etc.) around text in a given string.

Parameters:
st - the given string.
qualifier - the qualifier to be removed before and after text in the string.
Returns:
a copy of the given string without qualifiers.

removeQualifiers

public static java.lang.String removeQualifiers(java.lang.String st)
Removes quotation marks around text in a given string.

Parameters:
st - the given string.
Returns:
a copy of the given string without quotation marks.

normalizeLineBreaks

public static java.lang.String normalizeLineBreaks(java.lang.String string)
Converts line breaks in a given string to the Unix standard (\n).

Parameters:
string - the given string.
Returns:
a copy of the given string, all line breaks are \n.

normalizeWildcards

public static java.lang.String normalizeWildcards(java.lang.String string)
Converts shell wildcards (e.g. * and ?) in a given string to its Java regex representation.

Parameters:
string - the given string.
Returns:
a copy of the given string, all wildcards are converted into a correct Java regular expression. The result is checked for being a correct regex pattern. If it is not, then the given original string is returned as being most likely already a regex pattern.

containsWildcards

public static boolean containsWildcards(java.lang.String string)
Detects if a given string contains shell wildcard characters (e.g. * and ?).

Parameters:
string - the given string.
Returns:
true if the string contains the asterisk (*) or question mark (?).

matchesWildcard

public static boolean matchesWildcard(java.lang.String string,
                                      java.lang.String pattern,
                                      boolean filenameMode)
Detects if a given string matches a given pattern (not necessarily a regex), possibly containing wildcards

Parameters:
string - the given string (no-wildcards)
pattern - the pattern containing wildcards to match against
filenameMode - indicates if the given string should be considered a file name
Returns:
true if the given string matches the given pattern

matchesWildcard

public static boolean matchesWildcard(java.lang.String string,
                                      java.lang.String pattern)
Detects if a given string matches a given pattern (not necessarily a regex), possibly containing wildcards

Parameters:
string - the given string (no-wildcards)
pattern - the pattern containing wildcards to match against
Returns:
true if the given string matches the given pattern

split

public static java.lang.String[] split(java.lang.String string,
                                       java.lang.String delimRegex,
                                       int group)

split

public static java.lang.String[] split(java.lang.String string,
                                       java.lang.String delimRegex)

getNumOccurrences

public static int getNumOccurrences(java.lang.String str,
                                    java.lang.String substr)
Returns a number of occurrences of a given substring in a given string.

Parameters:
str - the given string.
substr - the given substring being sought.
Returns:
the number of occurrences of the substring in the string.

isWhitespace

public static boolean isWhitespace(java.lang.String str)
Checks if a given string contains only whitespace characters.

Parameters:
str - the given string
Returns:
true if the given string is whitespace

getString

public static java.lang.String getString(int length,
                                         char c)
Returns a new string padded with a given character repeated given times.

Parameters:
length - length of the new string
c - the character to pad the string
Returns:
the new string

padString

public static java.lang.String padString(java.lang.String string,
                                         int startPos,
                                         int endPos,
                                         char padder)
Pads a range of a given string with a given character.

Parameters:
string - the given string
startPos - start position of the pad range (including)
endPos - end position of the pad range (excluding)
padder - the character to pad the range with
Returns:
the given string with the given range padded with the given char

substring

public static java.lang.String substring(java.lang.String string,
                                         int start,
                                         int end)

charsToString

public static java.lang.String charsToString(java.util.Set<java.lang.Character> set)

mirrorString

public static java.lang.String mirrorString(java.lang.String str)
Returns the reversed version of a given string, e.g. "cba" for "abc".

Parameters:
str - The given string
Returns:
The reversed string