CompArg Comparative Sentences 2019


CompArg: Comparative Sentences 2019 dataset for comparative argument mining is composed of sentences annotated with BETTER / WORSE markers (the first object is better / worse than the second object) or NONE (the sentence does not contain a comparison of the target objects). The BETTER sentences stand for a pro-argument in favor of the first compared object and WORSE-sentences represent a con-argument and favor the second object.

We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e.g., "Python has better NLP libraries than MATLAB" ? Python, better, MATLAB). To this end, we manually annotate 7,199 sentences for 217 distinct target item pairs from several domains (27% of the sentences contain an oriented comparison in the sense of "better" or "worse"). A gradient boosting model based on pre-trained sentence embeddings reaches an F1 score of 85% in our experimental evaluation. The model can be used to extract comparative sentences for pro/con argumentation in comparative / argument search engines or debating technologies.


To download the corpus use the following links:


  • Chris Biemann
  • Alexander Bondarenko
  • Matthias Hagen
  • Alexander Panchenko