The PAN plagiarism corpus 2011 (PAN-PC-11) is a corpus for the evaluation of automatic plagiarism detection algorithms. For research purposes the corpus can be used free of charge.
You can access the PAN-PC-11 corpus on Zenodo.
If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to cite the corpus via [bib]. If you additionally want to link to the dataset, please use the dataset's [doi] for a stable link.
You might also be interested the following items:
- The results of the 1st International Competition on Plagiarism Detection.
- The results of the 2nd International Competition on Plagiarism Detection.
- The results of the 3rd International Competition on Plagiarism Detection.
- The reference implementation of the plagiarism detection performance measures used in the above competitions.
The PAN-PC-11 can be used to evaluate the following retrieval task:
- External Plagiarism Detection. Given a set of suspicious documents and a set of source documents, the task is to find all plagiarized sections in the suspicious documents and their respective source sections in the source documents.
- Intrinsic Plagiarism Detection. Given only a set of suspicious documents, the task is to identify all plagiarized sections, e.g., by detecting writing style breaches. The comparison of a suspicious document with other documents is not allowed in this task.
The PAN-PC-11 contains documents in which plagiarism has been inserted automatically as well as documents in which plagiarism has been inserted manually. The former have been constructed using a so-called random plagiarist, a computer program which constructs plagiarism according to a number of parameters, while the latter have been obtained with crowdsourcing via Amazon's Mechanical Turk.
A detailed description of the corpus construction can be found in the associated publication.
- Martin Potthast
- Benno Stein
- Alberto Barrón-Cedeño (NLEL at Universidad Polytécnica de Valencia)
- Paolo Rosso (NLEL at Universidad Polytécnica de Valencia)
Students: Andreas Eiselt