The Webis Known-Item Question Corpus 2013 (Webis-KIQC-13) contains annotations for 2,755 questions posted on Yahoo! Answers. For each question, 2 annotators were asked to categorize the question as having a known-item information need or not, to identify a ClueWeb09 website representing the known item, and whether false memories are contained in the description of the need. The corpus represents the decisions of the annotators who had discussions for the few questions on which they did not agree initially.
The corpus contains the IDs of the ClueWeb09 documents representing the known item and an annotated categorization and correction for questions with a false memory.
To download the corpus use the following link:
(1.2 MB, MD5 sum: 459da425d3e0b3fe245d2d9335e6444a)
If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to refer to the corpus via [bib].
The original questions were crawled from the Yahoo! Answers platform to enable repeatable research on realistic known-item topics.
Each question entry in the corpus contains the returned information from the Yahoo! Answers API and additionally fields for the ClueWeb09-ID and the URL of the respective known-item, a Boolean field for whether false memories are included and in case that this is true also fields for the type of the false memory and a short correction notice.
For more information on the construction of the dataset see the respective publication.
Students: Daniel Wägner