Webis-Context-sensitive-Word-Search-Queries-2022

Synopsis

This repository contains two datasets with word search queries. Each word search query consists of a token n-gram with one wildcard token ([MASK]). The answers to each query are the most likely token to replace the mask. All queries originate from wikitext-103 and CLOTH, the respected source is annotated for each query.

The original-token dataset lists exactly one top answer for each query. The ranked-answers dataset lists multiple, sorted answers in three relevance categories, where 3 is the most relevant. Please refer to the citation for more details.

Access

Please refer to this publication for citing the dataset. If you want to link the dataset, please use the dataset permalink [doi].

  • Download the dataset from Zenodo.

People

Publications