The Webis-Editorials-16 corpus is a novel corpus with 300 news editorials evenly selected from three diverse online news portals: Al Jazeera, Fox News, and The Guardian. The aim of the corpus is to study (1) the mining and classification of fine-grained types of argumentative discourse units and (2) the analysis of argumentation strategies pursued in editorials to achieve persuasion. To this end, each editorial contains manual type annotations of all units that capture the role that a unit plays in the argumentative discourse, such as assumption or statistics. The corpus consists of 14,313 units of six different types, each annotated by three professional annotators from the crowdsourcing platform


To download the corpus use the following link:

If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to refer to the corpus via [bib].


Regarding the distribution of the argumentative discourse unit types in the corpus, some general tendencies as well as some insightful differences can be observed. Generally, more than two third of an editorial usually comprises assumptions. This is not surprising, as the type assumption covers both claims and any other propositions that may require justification. While The Guardian has the highest proportion of assumptions (71.7%), it represents the median for most other types. Fox News more strongly relies on common ground, with more than one unit of that type on average. Even more clearly, 8.7% of all units in Fox News editorials is testimony evidence, more than twice as many on average as in The Guardian (4.55 vs. 2.53). In contrast, Al Jazeera seems to put more emphasis on anecdote. At least, it spreads anecdotes across more units (21.0% of all). Interestingly, all three portals behave very similar in their resort to statistics at the same time.

Unit Type Total Mean Std. dev. Median Min Max Percent
Common ground2410.801.5300131.7%
Testimony10893.635.422 0 44 7.6%
Statistics4211.402.760 0192.9%
Other1670.561.64 00241.2%
All units1431347.7114.284614132100%
The distribution of types of argumentative discourse units in the Webis-Editorials-16 corpus.

The corpus is described in more detail in this publication: [bib]. The source code for the segmentation algorithm used in the corpus construction (Section 4.1) is available on our GitHub account.


Students: Olaoluwa Anifowose, Philip Drewes, Jonas Köhler