Our dataset contains works of fanfiction, extracted from archiveofourown.org (AO3). Each work is between 50 and 6,000 words long and has between 1 and many trigger warnings assigned. The label set contains 32 different trigger warnings with a long-tailed frequency distribution, i.e. some labels are very common, most labels are increasingly rare. Our training dataset contains 307,102 examples, with 17,104 in validation and 17,040 in the test split.
Please refer to the publications for citing the dataset. If you want to link the dataset, please use the dataset permalink [doi].
- Download the dataset from Zenodo.
- Matti Wiegmann
- Magdalena Wolska
- Christopher Schröder
- Ole Borchardt
- Benno Stein
- Martin Potthast