Retrieval Models

Synopsis

Retrieval models are based on a linguistic theory and can be considered as heuristics that operationalize the probability ranking principle: "Given a query q, the ranking of documents according to their probabilities of being relevant to q leads to the optimum retrieval performance." (Robertson, 1997). To overview and compare the characteristics of well-known retrieval models we have developed an interactive map.

Research

In the literature a distinction between empirical models, probabilistic models, and language models is often made, which is rooted in the query-oriented understanding of retrieval tasks but also has historical reasons. Our map reflects this distinction.

By clicking on a model acronym in the map a short description of the respective retrieval model is displayed below the map. A retrieval model is either empirical, probabilistic, or of language model type. Below a model's acronym you find a code in the form of a quadrupel, [1 2 3 4], which hints the model's characteristics along four dimensions: (1) Feature type, which defines the basic principle to capture a document's content; possible values include document terms [T], latent or explicit taxonomic concepts [C], or an (often NLP-based) method yielding special [S] features. (2) Foundation of the Retrieval status value (RSV) computation; possible values include feature vector similarity [φ], relevance [ρ] assessment, or the ability of a document to generate [γ] a query. (3) Dependency on a Closed world; possible values are open [∪], where the document collection need not to be completely given, and closed [∩], where the collection must be completely given to compute global characteristics. (4) External knowledge, if used at all; possible values include none [∅], user feedback [✓], e.g. for relevance assessment purposes, and an additional [+] document collection, e.g., for computing collection-relative document similarities. Our scheme is not intended to exactly differentiate between all particularities of a model, but shall pinpoint retrieval model strengths and weaknesses. If you find it useful, if you have hints for its improvement, or if you detect incorrect statements please drop us a mail. Finally, we kindly ask you to refer to the overview using the related publication below.

pLSI MixtUnigram LDA LM BeliefNet BestMatch Inquery BII BIM 2-Poisson ProbIndex ESA CL-ESA WebGenre DivRand SuffixTree Genre LSI GVSM FuzzySet VSM Boolean
Legend [1 2 3 4]
(1) Feature type
  • Tterms
  • Cconcepts
  • Sspecial
(2) RSV computation
  • φsimilarity
  • ρrelevance
  • γgeneration
(3) Closed world
  • open collection
  • closed collection
(4) External knowledge
  • none
  • user feedback
  • +additional collection

People

Publications