de.aitools.ie.decomposition.ngram
Class WordNGramDecomposition

java.lang.Object
  extended by de.aitools.ie.decomposition.ngram.WordNGramDecomposition
All Implemented Interfaces:
Decomposition

public class WordNGramDecomposition
extends java.lang.Object
implements Decomposition

This class decomposes a given String into n-grams of words. An n-gram is a sub-sequence of n words. The next n-gram in result list is shifted by 1 word.

Version:
Author:
Steffen Becker
See Also:
WordNGramDecomposition, CharacterChunkingDecomposition

Constructor Summary
WordNGramDecomposition(int n)
          This class decomposes a given String into n-grams of words.
WordNGramDecomposition(int n, Decomposition decomposition)
          This class decomposes a given String into n-grams of words.
 
Method Summary
 java.util.List<Span> getSpans(java.lang.String text)
          Analyses a string and split it in parts.
 java.util.List<java.lang.String> getStrings(java.lang.String text, boolean asSubstring)
          Analyses a string and split it in parts.
static void main(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordNGramDecomposition

public WordNGramDecomposition(int n)
This class decomposes a given String into n-grams of words. An n-gram is a sub-sequence of n words. The next n-gram in result list is shifted by 1 word.
Per default WordDecompositionICU4J is used to detect words.

Parameters:
n - The number of words per n-gram.
See Also:
WordNGramDecomposition(int, Decomposition)

WordNGramDecomposition

public WordNGramDecomposition(int n,
                              Decomposition decomposition)
This class decomposes a given String into n-grams of words. An n-gram is a sub-sequence of n words. The next n-gram in result list is shifted by 1 word.

Parameters:
n - The number of words per n-gram.
decomposition - The decomposition strategy to identify words.
Method Detail

getSpans

public java.util.List<Span> getSpans(java.lang.String text)
Description copied from interface: Decomposition
Analyses a string and split it in parts. The return value is a list of Spans with start/end index in original string.

Specified by:
getSpans in interface Decomposition
Parameters:
text - The original text to decompose.
Returns:
List of Span with start/end index in the original string.
See Also:
Decomposition#getStrings(String, boolean)}

getStrings

public java.util.List<java.lang.String> getStrings(java.lang.String text,
                                                   boolean asSubstring)
Description copied from interface: Decomposition
Analyses a string and split it in parts. The return value is a list of this parts as Strings, either as substrings or string copies dependent on asSubstring parameter.

Specified by:
getStrings in interface Decomposition
Parameters:
text - The original text to decompose.
asSubstring - If true, returned strings in list are substrings of input text else explicit copies are returned. A substring is a pointer to the original string and start/end position. A string copy is an exact copy of the part.
If you are interested just in some parts of the text and don't want to hold the hole text in main memory, you might choose string copies.
Returns:
List of string, as substrings or string copies dependent on asSubstring parameter.
See Also:
Decomposition#getSpans(String)}

main

public static void main(java.lang.String[] args)