de.aitools.ie.keyphraseextraction
Class KeyphraseExtractor

java.lang.Object
  extended by de.aitools.ie.keyphraseextraction.KeyphraseExtractor
Direct Known Subclasses:
FrequentPhraseExtractor, HeadNounPhraseExtractor, RepeatedStringExtractor, StanfordNounPhraseExtractor, TextRankExtractor

public abstract class KeyphraseExtractor
extends java.lang.Object

An abstract class to extract key phrases from given text.

Version:
$Id: KeyphraseExtractor.java,v 1.8 2011/12/15 15:09:07 hoppe Exp $
Author:
martin.trenkmann@uni-weimar.de
See Also:
HeadNounPhraseExtractor, RepeatedStringExtractor

Method Summary
 java.util.SortedSet<Phrase> extract(java.lang.String text)
          Extracts all key phrases from the given text with score normalization to the range of [0, 1].
 java.util.SortedSet<Phrase> extract(java.lang.String text, int k)
          Extracts at most the k highest rated key phrases from the given text with score normalization to the range of [0, 1].
 java.util.SortedSet<Phrase> extract(java.lang.String text, int k, boolean normalize)
          Extracts at most the k highest rated key phrases from the given text with or without score normalization to the range of [0, 1].
 java.util.SortedSet<Phrase> extract(java.lang.String text, int k, int phraseLength, boolean normalize)
          Extracts at most the k highest rated key phrases from the given text with or without score normalization to the range of [0, 1].
 java.util.Locale getLocale()
          Returns the current locale.
static java.util.SortedSet<Phrase> getTopPhrases(java.util.Set<Phrase> phrases, int k)
          Extracts the k highest rated phrases.
 void setLocale(java.util.Locale locale)
          Registers a new locale to this extractor.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

extract

public java.util.SortedSet<Phrase> extract(java.lang.String text)
Extracts all key phrases from the given text with score normalization to the range of [0, 1].

Parameters:
text - the text to extract key phrases from.
Returns:
a ranked set of the highest scored key phrases.
See Also:
extract(String, int, int, boolean)

extract

public java.util.SortedSet<Phrase> extract(java.lang.String text,
                                           int k)
Extracts at most the k highest rated key phrases from the given text with score normalization to the range of [0, 1].

Parameters:
text - the text to extract key phrases from.
k - the maximal number of key phrases to extract.
Returns:
a ranked set of the highest scored key phrases.
See Also:
extract(String, int, int, boolean)

extract

public java.util.SortedSet<Phrase> extract(java.lang.String text,
                                           int k,
                                           boolean normalize)
Extracts at most the k highest rated key phrases from the given text with or without score normalization to the range of [0, 1].

Parameters:
text - the text to extract key phrases from.
k - the maximal number of key phrases to extract.
normalize - enables/disables score normalization.
Returns:
a ranked set of the highest scored key phrases.
See Also:
extract(String, int, int, boolean)

extract

public java.util.SortedSet<Phrase> extract(java.lang.String text,
                                           int k,
                                           int phraseLength,
                                           boolean normalize)
Extracts at most the k highest rated key phrases from the given text with or without score normalization to the range of [0, 1]. All phrases can be limited in their word-length by phraseLength.

Parameters:
text - the text to extract key phrases from.
k - the maximal number of key phrases to extract.
phraseLength - the maximal length of key phrases in words.
normalize - enables/disables score normalization to the range [0,1].
Returns:
a ranked set of the highest scored key phrases.

getLocale

public java.util.Locale getLocale()
Returns the current locale. The locale parameter determines how to tag words with its part-of-speech and how to recognize stopwords.

Returns:
the current locale.

setLocale

public void setLocale(java.util.Locale locale)
Registers a new locale to this extractor. This method may be overwritten in inherited classes.

Parameters:
locale - the locale to register.

getTopPhrases

public static java.util.SortedSet<Phrase> getTopPhrases(java.util.Set<Phrase> phrases,
                                                        int k)
Extracts the k highest rated phrases. If the set of given phrases contains less than k elements, the entire set is returned in a sorted form.

Parameters:
phrases - a set of phrases.
k - the maximal number of phrases to extract.
Returns:
a sorted set of at most k phrases.