|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.aitools.ie.postagging.PosTagger
public class PosTagger
A class to tag tokenized sentences with their corresponding POS types. This tagger currently supports the following languages:
Note that each language has its own set of POS tags, even if certain
word classes occur in several languages. For example the English noun will
be tagged with NN
, whereas the German noun will be tagged with
SUB
.
An overview of POS tag sets used by different languages can be found here:
TreeTagger
(search for "Tagsets"). Perhaps POS tags used by this library (OpenNLP) might differ.
TODO Wrap TreeTagger
TODO Check tags sets
Constructor Summary | |
---|---|
PosTagger(java.util.Locale locale)
Creates a new POS-tagger to tag sentences of the language denoted by the given locale. |
Method Summary | |
---|---|
java.util.Locale |
getLocale()
Gets the current localization of this POS-tagger. |
void |
setLocale(java.util.Locale locale)
Sets the localization for this POS-tagger. |
java.util.List<java.lang.String[]> |
tag(java.util.List<java.lang.String[]> tokenizedSentences)
Tags a list of tokenized sentences. |
java.lang.String[] |
tag(java.lang.String[] tokenizedSentence)
Tags a tokenized sentence. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public PosTagger(java.util.Locale locale) throws java.io.FileNotFoundException
locale
- the language to use for the internal model.
java.io.FileNotFoundException
- if the locale is not supported.Method Detail |
---|
public java.util.Locale getLocale()
public void setLocale(java.util.Locale locale) throws java.io.FileNotFoundException
Locale.getLanguage()
of the given locale.
locale
- the language to use for the internal model.
java.io.FileNotFoundException
public java.lang.String[] tag(java.lang.String[] tokenizedSentence)
tokenizedSentence
- an array of words supposed to form a sentence.
public java.util.List<java.lang.String[]> tag(java.util.List<java.lang.String[]> tokenizedSentences)
tokenizedSentences
- a list of sentences to be tagged.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |