Webis MS MARCO Anchor Text 2022

Synopsis

The Webis MS MARCO Anchor Text 2022 dataset enriches Version 1 and 2 of the document collection of MS MARCO with anchor text extracted from six Common Crawl snapshots. The six Common Crawl snapshots cover the years 2016 to 2021 (between 1.7-3.4 billion documents each). Overall, the MS MARCO Anchor Text 2022 dataset enriches 1,703,834 documents for Version 1 and 4,821,244 documents for Version 2 with up to 1,000 anchor texts each.

Access

Please refer to the publications for citing the dataset. If you want to link the dataset, please use the dataset permalink [doi].

  • Download the dataset from Zenodo

People

Publications