Webis-TLDR-17

Name: Webis-TLDR-17
Published: 2017
License: https://creativecommons.org/licenses/by/4.0/deed.en

Synopsis
People
Publications

Synopsis

The Webis TLDR Corpus (2017) consists of approximately 4 Million content-summary pairs extracted for Abstractive Summarization, from the Reddit dataset for the years 2006-2016. This corpus is first of its kind from the social media domain in English and has been created to compensate the lack of variety in the datasets used for abstractive summarization research using deep learning models.

Access

Please refer to this publication for citing the dataset. If you want to link the dataset, please use the dataset permalink [doi].

Download the dataset from Zenodo.
Find the related metadata at Google.

People

Shahbaz Syed
Michael Völske
Martin Potthast
Benno Stein

Webis-TLDR-17

Synopsis

Access

People

Publications

Args

ChatNoir

IR Anthology

Netspeak

Picapica

TIRA