Webis-ArgQuality-20
Synopsis
The Webis Argument Quality Corpus 2020 (Webis-ArgQuality-20) contains 1271 arguments spanning 20 topics with scores for rhetorical, logical, dialectical, and overall quality as well as topical relevance. Quality scores are inferred from a total of 42k pairwise judgments. Arguments are sourced from the args.me corpus.
Download
To download the corpus and model use the following links:
Documentation
The Webis-ArgQuality-20 corpus consists of two sets of data: a processed version, where for each annotated argument, a scalar value for each argument quality dimension is derived; and the raw annotation data, providing the individual paired comparison labels. The structure of both datasets is described below.
Processed Data
The dataset is split into three different tables. Each key represents a column name, with details about the contained data in the explanation field. Primary keys are marked in bold. If a combined key is used, all entries that the combined key is composed of are marked. Foreign keys that can be used to reference other tables are marked in italics.
- Argument Dataset
Key Explanation Topic ID Unique identifier for the topic context Argument ID Unique identifier for the item in regards to the discussion it is part of Discussion ID Unique identifier of the discussion the item is part of Is Argument? Boolean value, indicating wether the item is an argument, or not Stance Denotes the stance of the item, can be Pro, Con or Not specified Relevance Relevance score, z-normalised Logical Quality Logical quality score, z-normalised Rhetorical Quality Rhetorical quality score, z-normalised Dialectical Quality Dialectical quality score, z-normalised Combined Quality Combined quality score, z-normalised Premise Text of the items' premise Text Length Word Count of the premise - Ranking Dataset
Key Explanation Topic ID Unique identifier for the topic context Model Unique identifier of the discussion the item is part of Rank The rank of the argument in the respective engines ranking Argument ID Unique identifier for the argument in regards to the discussion it is part of Discussion ID Unique identifier of the discussion the argument is part of - Topic Dataset
Key Explanation Topic ID Unique identifier for the topic context Category Thematical category the topic belongs to Long Query Long query, used as input for the retrieval models Short Query Shortened form of the query
Raw Data
Individual comparisons for argument quality are given in a dedicated table each. Relevance annotations are included as well. Each key represents a column name, with details about the contained data in the explanation field. Primary keys are marked in bold. If a combined key is used, all entries that the combined key is composed of are marked. Foreign keys that can be used to reference other tables are marked in italics.
- Quality Annotations
Key Explanation Argument ID A Unique identifier for argument A in regards to the discussion it is part of Discussion ID A Unique identifier of the discussion argument A is part of Argument ID B Unique identifier for argument B in regards to the discussion it is part of Discussion ID B Unique identifier of the discussion argument B is part of Comparison Denotes the direction of the comparison; can be "A" if argument A is better, "B" if argument B is better, of "Tie", if both arguments are equal. - Relevance Annotations
Key Explanation Task ID ID of the annotation task this annotation was part of Argument ID Unique identifier for the argument in regards to the discussion it is part of Discussion ID Unique identifier of the discussion the argument is part of Relevance Denotes the relevance of this argument with regards to the topic on a scale of 0 (low) to 4 (high). -2 is used to mark irrelevant text/spam Is Argument? Boolean value, indicating wether the item is an argument, or not
Model Implementation
A Python implementation is provided. See code comments for additional implementation details. Also, an example describing the usage of the model is given, and can be applied to the raw data to derive the processed version.