The possibility to reproduce and compare results of other researchers is essential for scientific progress. In many research fields, however, it is often impossible to specify the complete experiment setting; e.g., in the scope of a scientific publication. As a consequence, a reliable comparison becomes difficult, if not impossible. The TIRA Integrated Research Architecture provides a means for evaluation as a service and is our approach to address this shortcoming. It focuses on hosting shared tasks and facilitates the submission of softwares as opposed to the output of running a software on a test dataset (a so-called run). TIRA encapsulates the submitted softwares into virtual machines. This way, even after a shared task is over, the submitted softwares can be re-evaluated at the click of a button, which severely increased the reproducibility of the corresponding shared task.

TIRA is currently one of the few (if not the only) platform that supports software submissions with little extra effort. We have used it to organize more than 25 shared tasks within PAN@CLEF, CoNLL, and WSDM Cup. All pieces of software have been collected to date, all archived for re-execution. This ensures replicability, and also reproducibility (e.g., re-evaluating the collected software on new datasets). An overview of existing shared tasks is available at [service] [video]


Students: Anna Beyer, Matthias Busse, Clement Welsch, Arnd Oberländer, Johannes Kiesel, Adrian Teschendorf, Manuel Willem