The Webis Comparative Web Search Questions 2020 (Webis-CompQuestions-20) corpus comprises 15,000 web questions collected from the public datasets MS Marco, Google Natural Questions and Quora. The questions were manually annotated as comparative or not and with more fine-grained subclasses.


You can access the Webis Comparative Web Search Questions Corpus 2020 on Zenodo.

If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to cite the corpus via [bib].


We analyze comparative questions, i.e., questions asking to compare different items that were submitted to a search engine. Responses to such questions might be quite different from the simple “ten blue links” and could, for example, aggregate pros and cons of the different options as direct answers. However, changing the result presentation is an intricate decision such that the classification of comparative questions forms a highly precision-oriented task.