Webis-Sentences-17

Synopsis

The Webis-Sentences-17 corpus is a collection of 3,369,618,811 sentences extracted from the ClueWeb12 web crawl. It is designed to allow for statistical analyses of human-written sentences. More details on the sentence extraction can be found in the associated publication.

Download

To download the corpus use the following links:

People

Publications