

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object de.aitools.ir.retrievalmodels.representer.AbstractRepresenter<java.lang.String,Vector> de.aitools.ir.retrievalmodels.representer.DivergenceFromRandomness
public class DivergenceFromRandomness
Divergence From Randomness (DfR) is retrieval model developed by Amati et al. It is based on the assumptions of Harter, that one can make assumptions about the significance of a term due to its distribution in the collection. Insignificant terms, like stop words, are distributed randomly over the whole document collection. They comply with a Poisson distribution. By contrast, informative terms can be found against the hypothesis of a Poisson distribution in a small subset of the document collection  the elite set. The assumption is, that terms within the elite set are again Poisson distributed. This probabilistic model is called 2PoissonModel.
Amati et al. weight terms in the DfR model by two probability distributions. The first probability states, that words with little information are randomly distributed on the whole set of documents. Subsequently, the lower the probability is, the higher is the information gain. To describe the notion of randomness, they provide seven models  including a Poisson model. The second probability in the scheme represents the risk to choose a term as a good descriptor for a document. The higher the risk, the higher is the gain in information, if the assumptions were wrong. The risk can be declared either by Laplace's "law of succession" or by an Bernoulli experiment.
Term frequencies are normalized by document lengths. A first hypothesis (H1) assumes, that all terms within a document are uniformly distributed. The second (H2) assumes, that the terms in short documents are more dense than in long documents. In experiments, the second hypothesis was favored.
A DfR model is described via a sequence of strings XYZ. X represents the basic model, Y the first normalization factor (the risk) and Z the second normalization factor (H1 or H2). The combination of DfR models that performs best in experiments was "In(e)B2". You can choose between the following types:
Basic models (to model the distribution of terms):
(1) Bose Einstein Statistics [BE ] (2) Divergence Model [D ] (3) Geometric Model [G ] (4) INQUERY System F [F ] (5) Tf Model [In ] (6) Tf ExpectedIdf Model [Ine] (7) Poisson Model [P ]
First normalization (socalled after affect models):
(1) Laplace Normalization [L] (2) Bernoulli Normalization [B]
Second normalization (for term frequencies):
(1) Hypothesis 1 [H1] (2) Hypothesis 2 [H2]
Reference: Probabilistic models of information retrieval based on measuring the divergence from randomness, by Amati et. al, 2002.
Nested Class Summary  

static class 
DivergenceFromRandomness.BasicModel

static class 
DivergenceFromRandomness.N1

static class 
DivergenceFromRandomness.N2

Constructor Summary  

DivergenceFromRandomness(DivergenceFromRandomness.BasicModel x,
DivergenceFromRandomness.N1 y,
DivergenceFromRandomness.N2 z,
TermFrequency tf)


DivergenceFromRandomness(java.util.Locale l)
This constructor initializes the Divergence from Randomness model per default with 'In(e)B2'. 

DivergenceFromRandomness(java.util.Locale l,
DivergenceFromRandomness.BasicModel x,
DivergenceFromRandomness.N1 y,
DivergenceFromRandomness.N2 z)
Provides a DfR model with your preferred combination of a basic model x , a risk factor y , and a saturation function
y to normalize term frequencies. 
Method Summary  

boolean 
isTrained()

Vector 
represent(java.lang.String text)

void 
train(java.lang.Iterable<java.lang.String> texts,
boolean forceTraining)

Methods inherited from class de.aitools.ir.retrievalmodels.representer.AbstractRepresenter 

train 
Methods inherited from class java.lang.Object 

equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
Constructor Detail 

public DivergenceFromRandomness(java.util.Locale l)
l
 used by the TermFrequency
DivergenceFromRandomness(Locale, BasicModel, N1, N2)
,
DivergenceFromRandomness(BasicModel, N1, N2, TermFrequency)
public DivergenceFromRandomness(java.util.Locale l, DivergenceFromRandomness.BasicModel x, DivergenceFromRandomness.N1 y, DivergenceFromRandomness.N2 z)
x
, a risk factor y
, and a saturation function
y
to normalize term frequencies.
l
 used by the TermFrequency
x
 one of the basic models defined by #BasicModely
 one of the risk functions defined by #N1z
 one of the term frequency normalizations defined by #N2DivergenceFromRandomness(BasicModel, N1, N2, TermFrequency)
public DivergenceFromRandomness(DivergenceFromRandomness.BasicModel x, DivergenceFromRandomness.N1 y, DivergenceFromRandomness.N2 z, TermFrequency tf)
x
 one of the basic models defined by #BasicModely
 one of the risk functions defined by #N1z
 one of the term frequency normalizations defined by #N2tf
 used to represent documentsMethod Detail 

public Vector represent(java.lang.String text)
public boolean isTrained()
public void train(java.lang.Iterable<java.lang.String> texts, boolean forceTraining)


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 