Webis-Sentences-17

Name: Webis-Sentences-17
Published: 2017
License: https://creativecommons.org/licenses/by/4.0/deed.en

Synopsis
People
Publications

Synopsis

The Webis-Sentences-17 corpus is a collection of 3,369,618,811 sentences extracted from the ClueWeb12 web crawl. It is designed to allow for statistical analyses of human-written sentences. More details on the sentence extraction can be found in the associated publication.

Access

Please refer to this publication for citing the dataset. If you want to link the dataset, please use the dataset permalink [doi].

Download the dataset from Zenodo.
Find the related metadata at Google.

People

Johannes Kiesel
Benno Stein
Stefan Lucks

Webis-Sentences-17

Synopsis

Access

People

Publications

Args

ChatNoir

IR Anthology

Netspeak

Picapica

TIRA