CAIR is a cooperative research project between the Information Engineering Group (Universität Duisburg-Essen) and our webis group. Cluster analysis combines an object model, a similarity measure, and a merging strategy. Though a good deal of existing research focuses on merging it is clear that successful cluster analysis requires the integration of knowledge about the domain, the task, and the users. This understanding of a "semantic cluster analysis" can produce solutions for relevant information retrieval (IR)  tasks that are more effective than existing approaches. The objective of CAIR is the theoretical, methodological, and experimental study of cluster analysis in information retrieval, whereas semantics is investigated in different respects: (1) in the form of specialized retrieval models that consider knowledge of the IR task, (2) for multi-objective and interactive analyses that employ an explicit user model, (3) within hybrid merging strategies that combine algorithms, and (4) for improved cluster labeling.

The project is funded by the German Research Foundation (DFG).


One of the project outcomes is the concept of "keyqueries" as document descriptors. Representing documents in terms of the search queries for which they are most relevant has natural applications in cluster analysis. Given a document collection, it allows the automatic generation of a hierarchical taxonomy with good cluster labels.

As part of our project, we organized the following events:

The following projects are related to our project:

The following corpora were developed in our project:

Further information can be found on the project page of the Information Engineering Group.


Students: Johannes Kiesel