Webis-QInC-22
Synopsis
The Webis Query Interpretation Corpus 2022 (Webis-QInC-22) contains manually selected explicit entities, implicit entities and entity based interpretations for 2,800 web queries. These web queries were either obtained from existing entity linking datasets or ambiguity queries from various sources.
The dataset consists of a train and test split and comes as a Zip-compressed archive of JSON files. Each JSON file contains an array of queries where each is formatted like the following:
{
"id": "webis-001",
"query": "new york times square dance",
"difficulty": 2,
"categories": [
"ConQ"
],
"explicit_entities": [
{
"mention": "new york times",
"entity": [
"https://en.wikipedia.org/wiki/The_New_York_Times"
],
"relevance": 2
},
{
"mention": "new york",
"entity": [
"https://en.wikipedia.org/wiki/New_York_City"
],
"relevance": 2
},
{
"mention": "times square",
"entity": [
"https://en.wikipedia.org/wiki/Times_Square"
],
"relevance": 2
},
...
],
"implicit_entities": [],
"interpretations": [
{
"id": 0,
"interpretation": [
"https://en.wikipedia.org/wiki/The_New_York_Times",
"square dance"
],
"relevance": 3,
"equivalent": null,
"comment": "Articles about square dance in the New York Times newspaper"
},
{
"id": 1,
"interpretation": [
"https://en.wikipedia.org/wiki/New_York_City",
"https://en.wikipedia.org/wiki/Times_Square",
"dance"
],
"relevance": 3,
"equivalent": null,
"comment": "Dance happening at the Times Square in New York City"
},
...
],
}
Additionally, the dataset comprise line-based JSON files that document the evaluation of explicit entity retrieval and query interpretation of all analyzed entity linking tools.
Download
You can access the Webis-QInC-22 corpus on Zenodo.
If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to cite the corpus via [bib]. If you additionally want to link to the dataset, please use the dataset's [doi] for a stable link.