The corpus consists of 1,128 German news articles from the years 2003 to 2009, collected from 29 general and business news websites. In each article, statements on the revenue of companies or markets were manually annotated, i.e., sentences and entities that refer to a statement are tagged and linked to each other.

Here is an example of a revenue statement from the corpus:

Loewe AG: Vorläufige Neun-Monats-Zahlen
Kronach, [6. November 2007]REF - Das Ergebnis vor Zinsen und Steuern (EBIT) des Loewe Konzerns konnte in den ersten 9 Monaten 2007 um 41% gesteigert werden. Vor diesem Hintergrund hebt die [Loewe AG]ORG ihre EBIT-Prognose für das laufende Geschäftsjahr auf 20 Mio. Euro an. Beim Umsatz strebt Konzernchef [Rainer Hecker]AUTH [für das  Gesamtjahr]TIME ein höher als ursprünglich geplantes [Wachstum]TREND [von 10% auf ca. 380 Mio. Euro]MONEY an. (...)

A revenue statement comprises seven attributes:

  • Forecast/Declaration: A sentence that represents a forecast or declaration on revenue.
  • Organization/Market: The subject of the statement, i.e., either an organization or market.
  • Time Expression: The period of time referenced by the statement.
  • Reference Point: The point in time when the statement was issued (used to resolute relative time expressions). 
  • Money Expression: The monetary value referenced by the statement.
  • Author: The holder of the statement.
  • Trend: A word that indicates the trend of the monetary entity.

A total of 2,075 statements have been annotated by domain experts. For more information on the construction of the dataset see the corpus manual or [Wachsmuth et al., 2010].


Each article in the corpus is represented as an XML document using the XMI (XML Metadata Interchange) format provided by Apache UIMA. The articles in the corpus are split into training, validation, and test set.

To download the corpus use the following link:

If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to refer to the corpus via [bib].