de.aitools.ie.languagedetection
Class TrigramStatistic

java.lang.Object
  extended by de.aitools.ie.languagedetection.TrigramStatistic

public class TrigramStatistic
extends java.lang.Object

Author:
fabian.loose@uni-weimar.de, martin.potthast@uni-weimar.de

Constructor Summary
TrigramStatistic()
           
 
Method Summary
static java.util.Map<java.lang.String,java.lang.Double> getTrigrams(java.lang.String s)
          Creates a map containing a fixed number of the most frequent trigrams in a given string.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TrigramStatistic

public TrigramStatistic()
Method Detail

getTrigrams

public static java.util.Map<java.lang.String,java.lang.Double> getTrigrams(java.lang.String s)
Creates a map containing a fixed number of the most frequent trigrams in a given string. At the same time it is checked whether the amount of non-Latin characters is higher then the amount of Latin characters. In this case all Latin character trigrams are removed. This is necessary because English words and sentences can be found in texts of all languages, and, since the alphabet of Eastern languages like Chinese is larger than the Western alphabet, English trigrams may have a high frequencies even if there is only little English text to be found.