Webis MS MARCO Anchor Text 2022
The Webis MS MARCO Anchor Text 2022 dataset enriches Version 1 and 2 of the document collection of MS MARCO with anchor text extracted from six Common Crawl snapshots. The six Common Crawl snapshots cover the years 2016 to 2021 (between 1.7-3.4 billion documents each). Overall, the MS MARCO Anchor Text 2022 dataset enriches 1,703,834 documents for Version 1 and 4,821,244 documents for Version 2 with up to 1,000 anchor texts each.
The Webis MS MARCO Anchor Text 2022 dataset is available at:
- Maik Fröbe
- Sebastian Günther
- Maximilian Probst
- Martin Potthast
- Matthias Hagen