de.aitools.ie.languagedetection
Class LanguageDetector
java.lang.Object
de.aitools.ie.languagedetection.LanguageDetector
public class LanguageDetector
- extends java.lang.Object
This class is the main interface to the language detection package.
TODO(loose): The interface (getLanguage() method) could as well be static?
But in future one could want to get more information about for example the
second most probable language or the distance from the most probable language
to the next one...
TODO(loose): fix models: pl,lt -- these (and probably a few other) models are
the best guess when the text contains many white spaces, special character
etc. ... so the language (wiki) corpus still seems to have problems. -- good
test with vertical search results, as these texts somehow randomly come from
the web
- Author:
- fabian.loose@uni-weimar.de, martin.potthast@uni-weimar.de
Method Summary |
java.util.Locale |
detect(java.lang.String s)
Detects the language of a string based on its character trigrams. |
static void |
main(java.lang.String[] args)
Required so that the language model index can be initialized by Ant. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SERIALIZATION_NAME
public static final java.lang.String SERIALIZATION_NAME
- See Also:
- Constant Field Values
LanguageDetector
public LanguageDetector()
detect
public java.util.Locale detect(java.lang.String s)
- Detects the language of a string based on its character trigrams.
main
public static void main(java.lang.String[] args)
- Required so that the language model index can be initialized by Ant.