|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.aitools.ie.decomposition.word.WordTokenization
public class WordTokenization
This class decomposes a given String
into words without removing
anything that is not a word. So even whitespace characters are returned as
tokens. Concatenation of the list of words produces the original string
again. For word extraction have a look at the WordDecomposition* classes.
WordDecompositionICU4J
,
WordDecompositionDelim
,
WordDecompositionOpenNLP
Constructor Summary | |
---|---|
WordTokenization(java.util.Locale language)
This class decomposes a given String into words without removing
anything that is not a word. |
Method Summary | |
---|---|
java.util.List<Span> |
getSpans(java.lang.String text)
Analyses a string and split it in parts. |
java.util.List<java.lang.String> |
getStrings(java.lang.String text,
boolean asSubstring)
Analyses a string and split it in parts. |
static void |
main(java.lang.String[] args)
Test. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public WordTokenization(java.util.Locale language)
String
into words without removing
anything that is not a word. So even whitespace characters are returned as
tokens. Concatenation of the list of words produces the original string
again. For word extraction have a look at the WordDecomposition* classes.
WordDecompositionICU4J
,
WordDecompositionDelim
,
WordDecompositionOpenNLP
Method Detail |
---|
public java.util.List<Span> getSpans(java.lang.String text)
Decomposition
Span
s with start/end index in original string.
getSpans
in interface Decomposition
text
- The original text to decompose.
Span
with start/end index in the original string.Decomposition#getStrings(String, boolean)}
public java.util.List<java.lang.String> getStrings(java.lang.String text, boolean asSubstring)
Decomposition
getStrings
in interface Decomposition
text
- The original text to decompose.asSubstring
- If true, returned strings in list are substrings of input text
else explicit copies are returned. A substring is a pointer to the
original string and start/end position. A string copy is an exact
copy of the part.Decomposition#getSpans(String)}
public static void main(java.lang.String[] args)
args
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |