de.aitools.ir.retrievalmodels.representer
Class OkapiBM25
java.lang.Object
de.aitools.ir.retrievalmodels.representer.AbstractRepresenter<java.lang.String,Vector>
de.aitools.ir.retrievalmodels.representer.OkapiBM25
- All Implemented Interfaces:
- Representer<java.lang.String,Vector>, java.io.Serializable
public class OkapiBM25
- extends AbstractRepresenter<java.lang.String,Vector>
Okapi BM25 is a retrieval model developed by Robertson et al. in the
early 90's. We use it here as a term weighting scheme. BM25 is a
normalization between BM11 and BM15. The latter incorporates no normalization
regarding the document length. Robertson argues, that a normalization is
essential, because different authors are more or less verbose. BM25 can be
seen as a tf-idf model, whereas the term frequency component is actually a
non-linear saturation function. We can look on BM25 as a
"Divergence from Randomness" (DfR) model, too. Here, the idf component
corresponds to the DfR over the whole collection and the tf component is the
DfR contribution of the document itself.
The weighting scheme has to be trained initially to compute the the average
document length and the df vector. BM25 depends on some parameters, which are
explained for each constructor. Per default, this implementation takes
b = 0.75
, k1 = 1.2
and k3 = 8
as
parameters. k2
is always zero. We do not perform a global
correction of the document length for queries.
Reference:
Okapi at TREC-4, by Robertson et. al, 1995.
- Version:
- aitools 3.0 Created on Mar 28, 2010 $Id: OkapiBM25.java,v 1.1
2010/05/19 15:52:03 poma1006 Exp $
- Author:
- [email protected]
- See Also:
- Serialized Form
Field Summary |
static double |
B
|
static double |
K1
|
static double |
K3
|
Constructor Summary |
OkapiBM25(double k1,
double k3,
double b,
TermFrequency tf)
|
OkapiBM25(java.util.Locale l)
Constructor initializes the BM25 formula with k1 = 1.2 and
b = 0.75 . |
OkapiBM25(java.util.Locale l,
double b)
|
OkapiBM25(java.util.Locale l,
double k1,
double k3,
double b)
|
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
B
public static final double B
- See Also:
- Constant Field Values
K1
public static final double K1
- See Also:
- Constant Field Values
K3
public static final double K3
- See Also:
- Constant Field Values
OkapiBM25
public OkapiBM25(java.util.Locale l)
- Constructor initializes the BM25 formula with
k1 = 1.2
and
b = 0.75
.
- Parameters:
l
- the locale to be used by TermFrequency
- See Also:
OkapiBM25(Locale, double)
,
OkapiBM25(Locale, double, double, double)
,
#OkapiBM25(TermFrequency, double, double, double)
OkapiBM25
public OkapiBM25(java.util.Locale l,
double b)
- Parameters:
l
- the locale to be used by TermFrequency
b
- if b takes 0, the best match formula yields to BM15 without
any length normalization. Assigning 1 to b yields to BM11 with
length normalization. Every value between 0 and 1 is allowed
and results in a soft normalization between BM15 and BM11- See Also:
OkapiBM25(Locale, double, double, double)
,
OkapiBM25(double, double, double, TermFrequency)
OkapiBM25
public OkapiBM25(java.util.Locale l,
double k1,
double k3,
double b)
- Parameters:
l
- the locale to be used by TermFrequency
k1
- is used to approximate the shape of the saturation function
tf/(k1+tf). k1 must be greater than 0.b
- if b takes 0, the best match formula yields to BM15 without
any length normalization. Assigning 1 to b yields to BM11 with
length normalization. Every value between 0 and 1 is allowed
and results in a soft normalization between BM15 and BM11- See Also:
OkapiBM25(double, double, double, TermFrequency)
OkapiBM25
public OkapiBM25(double k1,
double k3,
double b,
TermFrequency tf)
- Parameters:
tf
- an instance of TermFrequency
k1
- is used to approximate the shape of the saturation function
tf/(k1+tf). k1 must be greater than 0.b
- if b takes 0, the best match formula yields to BM15 without
any length normalization. Assigning 1 to b yields to BM11 with
length normalization. Every value between 0 and 1 is allowed
and results in a soft normalization between BM15 and BM11.
setState
public void setState(OkapiBM25.RepresentationState state)
- Changes the way of representing text.
- Parameters:
state
- a new state which defines how the text is represented
represent
public Vector represent(java.lang.String text)
train
public void train(java.lang.Iterable<java.lang.String> texts,
boolean forceTraining)
isTrained
public boolean isTrained()
getK1
public final double getK1()
setK1
public final void setK1(double k1)
getK3
public final double getK3()
setK3
public final void setK3(double k3)
getB
public final double getB()
setB
public final void setB(double b)