de.aitools.ir.fingerprinting.refactored
Class PrefixDistribution

java.lang.Object
  extended by de.aitools.ir.fingerprinting.refactored.PrefixDistribution
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
PrefixDistributionBNC, PrefixDistributionTiger

public abstract class PrefixDistribution
extends java.lang.Object
implements java.io.Serializable

A class to represent a prefix distribution. A prefix distribution is a collection of a-priori probabilities of word prefixes of a certain language and/or document corpus. Furthermore this class is responsible to serialize its content into an external file and vice-versa.

Author:
martin.trenkmann@uni-weimar.de
See Also:
Serialized Form

Constructor Summary
PrefixDistribution(java.lang.String file)
          The explicit constructor.
 
Method Summary
 java.lang.Integer getIndex(java.lang.String prefix)
          Returns the index of the given prefix (not case sensitive).
 java.util.Locale getLocale()
          Returns the prefix distribution's localization information.
 double[] getProbabilities(int prefixLength)
          Returns the double vector of prefix probabilities.
 void load(java.io.InputStream in)
           
 void save(java.io.OutputStream out)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PrefixDistribution

public PrefixDistribution(java.lang.String file)
                   throws java.lang.NumberFormatException,
                          java.io.IOException,
                          java.net.URISyntaxException
The explicit constructor. Tries to load a prefix distribution from the given text file. The format of the file is defined as a list of prefix probability pairs. The created probability vector is not order preserving with respect to the given input file, but always sorted in alphabetical order.

Parameters:
file - a text file containing a prefix distribution.
Throws:
java.lang.NumberFormatException
java.io.IOException
java.net.URISyntaxException
Method Detail

getLocale

public java.util.Locale getLocale()
Returns the prefix distribution's localization information. Derived classes are responsible to override this value, otherwise the locale is set to Locale.ROOT by default.

Returns:
the locale object of this distribution.

getProbabilities

public double[] getProbabilities(int prefixLength)
Returns the double vector of prefix probabilities. The values of this vector are sorted alphabetically with respect to their underlying prefixes and are defined in the range of [0,1]. If the actual distribution cannot provide probabilities for the given prefix length an empty vector is returned. To obtain the index to a certain prefix look at getIndex(String).

Parameters:
prefixLength - the length of prefixes to get the distribution for.
Returns:
the vector of probabilities as double values.

getIndex

public java.lang.Integer getIndex(java.lang.String prefix)
Returns the index of the given prefix (not case sensitive). This index points to the prefix' probability in the probabilities vector, which can be accessed by getProbabilities(int). Note that the index is only valid for the correct prefix length argument. If the prefix is unknown, which means that there is no probability available, the method returns null.

Parameters:
prefix - the string of the prefix to get the index for.
Returns:
the index of the prefix' probability or null if unknown.

load

public void load(java.io.InputStream in)
          throws java.lang.NumberFormatException,
                 java.io.IOException
Throws:
java.lang.NumberFormatException
java.io.IOException

save

public void save(java.io.OutputStream out)
          throws java.io.IOException
Throws:
java.io.IOException