de.aitools.ie.decomposition
Interface Decomposition

All Known Implementing Classes:
CharacterChunkingDecomposition, CharacterNGramDecomposition, SentenceDecompositionICU4J, SentenceDecompositionOpenNLP, WordChunkingDecomposition, WordDecompositionDelim, WordDecompositionICU4J, WordDecompositionOpenNLP, WordNGramDecomposition, WordTokenization

public interface Decomposition

Class for splitting text into specific parts.

Version:
aitools 2.0 Created on 19.09.2008 $Id: Decomposition.java,v 1.7 2009-06-19 16:00:55 bege5932 Exp $
Author:
Steffen Becker, Fabian Loose
See Also:
SentenceDecompositionICU4J, SentenceDecompositionOpenNLP, WordDecompositionICU4J, WordDecompositionDelim, WordDecompositionOpenNLP, WordTokenization, CharacterChunkingDecomposition, WordChunkingDecomposition, CharacterNGramDecomposition, WordNGramDecomposition

Method Summary
 java.util.List<Span> getSpans(java.lang.String text)
          Analyses a string and split it in parts.
 java.util.List<java.lang.String> getStrings(java.lang.String text, boolean asSubstring)
          Analyses a string and split it in parts.
 

Method Detail

getStrings

java.util.List<java.lang.String> getStrings(java.lang.String text,
                                            boolean asSubstring)
Analyses a string and split it in parts. The return value is a list of this parts as Strings, either as substrings or string copies dependent on asSubstring parameter.

Parameters:
text - The original text to decompose.
asSubstring - If true, returned strings in list are substrings of input text else explicit copies are returned. A substring is a pointer to the original string and start/end position. A string copy is an exact copy of the part.
If you are interested just in some parts of the text and don't want to hold the hole text in main memory, you might choose string copies.
Returns:
List of string, as substrings or string copies dependent on asSubstring parameter.
See Also:
Decomposition#getSpans(String)}

getSpans

java.util.List<Span> getSpans(java.lang.String text)
Analyses a string and split it in parts. The return value is a list of Spans with start/end index in original string.

Parameters:
text - The original text to decompose.
Returns:
List of Span with start/end index in the original string.
See Also:
Decomposition#getStrings(String, boolean)}