Synopsis

Picapica is a Web-based application for the algorithmic detection of text reuse. Its underlying analytics pipeline is comprised of two major steps: (1) Source retrieval, i.e., the retrieval of reference documents from the World Wide Web as well as from specially prepared plagiarism indexes. The result of these searches is a set of URLs to Web documents, which are downloaded on a distributed server architecture. (2) Detailed analysis of a suspicious document against reference documents. The applied technologies cover hashing approaches such as fuzzy fingerprinting, text sequence alignment via multi-level cluster analyses, and writing style comparisons. [demos: essay viewer, wikipedia reuse, scientific reuse] [service] [video]

The project has been supported by the EXIST program of the Federal Ministry of Economics and Technology (BMWi).

People

EXIST scholarship students: Christof Bräutigam, Christina Eisenach, Jan Graßegger, Daniel Plath

Other students: Bjarne Sievers, Dennis Braunsdorf, Matthias Busse, Franz Coriand, Andreas Eiselt, Jan Hühne, Alexander Kleppe, Karsten Klüger, Alexander Kümmel, Marion Kulig, Christoph Lössnitz, Fabian Loose, Hagen-Christian Tönnies, Martin Trenkmann, Michael Völske, André Zölitz

Publications