Webis-Gmane-19

Name: Webis-Gmane-19
Published: 2019
License: https://creativecommons.org/licenses/by/4.0/deed.en

Synopsis
People
Publications

Synopsis

A large-scale corpus of over 153 million fully-segmented emails from 14.635 public mailing lists.

The Webis Gmane Email Corpus 2019 is a dataset of more than 153 million parsed and segmented emails crawled between February and May 2019 from gmane.io covering more than 20 years of public mailing lists. The dataset has been published as a resource at ACL 2020.

Access

Please refer to this publication for citing the dataset.

Download the dataset from Zenodo or from the Internet Archive.
Find the related metadata at Google.

People

Janek Bevendorff
Khalid Al-Khatib
Martin Potthast
Benno Stein

Webis-Gmane-19

Synopsis

Access

People

Publications

Args

ChatNoir

IR Anthology

Netspeak

Picapica

TIRA