de.aitools.ie.postagging
Class PosTagger

java.lang.Object
  extended by de.aitools.ie.postagging.PosTagger

public class PosTagger
extends java.lang.Object

A class to tag tokenized sentences with their corresponding POS types. This tagger currently supports the following languages:

Note that each language has its own set of POS tags, even if certain word classes occur in several languages. For example the English noun will be tagged with NN, whereas the German noun will be tagged with SUB.
An overview of POS tag sets used by different languages can be found here: TreeTagger (search for "Tagsets"). Perhaps POS tags used by this library (OpenNLP) might differ.

TODO Wrap TreeTagger

TODO Check tags sets

Version:
$Id: PosTagger.java,v 1.1 2011/06/13 12:31:39 trenkman Exp $
Author:
martin.trenkmann@uni-weimar.de

Constructor Summary
PosTagger(java.util.Locale locale)
          Creates a new POS-tagger to tag sentences of the language denoted by the given locale.
 
Method Summary
 java.util.Locale getLocale()
          Gets the current localization of this POS-tagger.
 void setLocale(java.util.Locale locale)
          Sets the localization for this POS-tagger.
 java.util.List<java.lang.String[]> tag(java.util.List<java.lang.String[]> tokenizedSentences)
          Tags a list of tokenized sentences.
 java.lang.String[] tag(java.lang.String[] tokenizedSentence)
          Tags a tokenized sentence.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PosTagger

public PosTagger(java.util.Locale locale)
          throws java.io.FileNotFoundException
Creates a new POS-tagger to tag sentences of the language denoted by the given locale.

Parameters:
locale - the language to use for the internal model.
Throws:
java.io.FileNotFoundException - if the locale is not supported.
Method Detail

getLocale

public java.util.Locale getLocale()
Gets the current localization of this POS-tagger.

Returns:
the language of the internal model.

setLocale

public void setLocale(java.util.Locale locale)
               throws java.io.FileNotFoundException
Sets the localization for this POS-tagger. This method only evaluates Locale.getLanguage() of the given locale.

Parameters:
locale - the language to use for the internal model.
Throws:
java.io.FileNotFoundException

tag

public java.lang.String[] tag(java.lang.String[] tokenizedSentence)
Tags a tokenized sentence.

Parameters:
tokenizedSentence - an array of words supposed to form a sentence.
Returns:
an array of the corresponding POS-tags.

tag

public java.util.List<java.lang.String[]> tag(java.util.List<java.lang.String[]> tokenizedSentences)
Tags a list of tokenized sentences.

Parameters:
tokenizedSentences - a list of sentences to be tagged.
Returns:
a list of the corresponding POS-tag arrays.