public class LanguageDetector
- extends java.lang.Object
This class is the main interface to the language detection package.
TODO(loose): The interface (getLanguage() method) could as well be static?
But in future one could want to get more information about for example the
second most probable language or the distance from the most probable language
to the next one...
TODO(loose): fix models: pl,lt -- these (and probably a few other) models are
the best guess when the text contains many white spaces, special character
etc. ... so the language (wiki) corpus still seems to have problems. -- good
test with vertical search results, as these texts somehow randomly come from
- email@example.com, firstname.lastname@example.org
Detects the language of a string based on its character trigrams.
Required so that the language model index can be initialized by Ant.
|Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public static final java.lang.String SERIALIZATION_NAME
- See Also:
- Constant Field Values
public java.util.Locale detect(java.lang.String s)
- Detects the language of a string based on its character trigrams.
public static void main(java.lang.String args)
- Required so that the language model index can be initialized by Ant.