Dear prospective PhD student, unsolicited applications to the Webis group ( are welcome. However, we cannot promise that open positions are available at the time of your application.

The Webis Group is a tightly cooperating research network, formed by computer science chairs at the universities of Halle, Leipzig, Paderborn, and Weimar. Our mission is to tackle challenges of the information society by conducting basic and applied research with the goal of prototyping and evaluating future information systems. We are an experienced research group where team spirit and active collaboration has top priority. We are looking for open-minded graduates and PhDs who want to develop both as a researcher and as a person. The working language of our group is English; fluency in German is not required.

Interested students should have finished either a master or a PhD in computer science, mathematics, or a related field with excellent or very good grades. A solid background in mathematics and statistics is expected—as well as very good programming skills.

Benno Stein
Bauhaus-Universität Weimar
On behalf of the Webis group


Open Thesis Topics

Students who are eager to develop their skills by doing a research-oriented thesis in our group should mail their interests to Suitable topic candidates are shown in the following list, which is not meant to be complete though:

  • Advancing and Benchmarking Large-Scale Content Extraction from the Web
  • Authorship Attribution based on Supra-segmental Features
  • Semi-automatic Knowledge Graph Authoring to Facilitate Retrieval of Expert Knowledge
  • The Impact of Near-Duplicates on Bootstrapping in Information Retrieval
  • Verbalizing Entity-based Answers in Conversational QA-Systems
  • Adapting sentence embeddings to OCR erroneous data
  • Assessing the Topic-dependence of Stance Classification

Ongoing Theses

  • Halle
    • Danik Hollatz. Precision-Oriented Argument Retrieval (supervised by Maik Fröbe and Alexander Bondarenko)
    • Adrien Klose. Multi-Task Learning with IR Axioms (supervised by Maik Fröbe and Alexander Bondarenko)
    • Christian Peters. Extraktion von Meta-Daten aus Wissenschaftlichen Artikeln mittels Textklassifikation durch Transformer am Beispiel von Daten für Bodenbeschreibungen (supervised by Alexander Hinneburg and Ferdinand Schlatt)
    • Ekaterina Schirschakova. Clarifying the Objects and Aspects to Answer Comparative Questions (supervised by Alexander Bondarenko and Matthias Hagen)
    • Wilhelm Beiche. Contextualized Term Weighting for Total Recall (supervised by Maik Fröbe, Ferdinand Schlatt and Matthias Hagen)
  • Leipzig
    • Henrik Bininda. Crawling and Analyzing the Novelupdates Corpus (supervised by Erik Körner)
    • Hannes Hansen. From contextualized to static word embeddings (supervised by Niklas Deckers, together with Clara Meister and Lukas Muttenthaler)
    • David Hanslischeck. Quantification of the Overton Window using Text Mining (supervised by Christian Kahmann)
    • Simon Kleine. How we Argue: A Study of Vocal Argument-seeking Conversations (supervised by Johannes Kiesel)
    • Kai Knappik. Simulation of Web Users for Online Discourse Analysis (supervised by Tim Gollub, Sebastian Günther, Johannes Kiesel)
    • Lukas Gienapp. Separating Social Spheres on Wikipedia (supervised by Martin Potthast, Arno Simons, Benno Stein)
    • Gabriel Huppenbauer. Context Dynamics of the Term Sustainability (supervised by Christian Kahmann)
    • Christian Staudte. Building a Large-scale Argumentation Graph (supervised by Khalid Al-Khatib)
    • Eric Schmidt. Identifying Debating Strategies on Wikipedia (supervised by Khalid Al-Khatib)
    • Lukas Göhlich. Detecting Bias in Media (supervised by Khalid Al-Khatib, and Shahbaz Syed)
    • Lucy Betke. Detecting Bias in Summarization (supervised by Khalid Al-Khatib, and Shahbaz Syed)
    • Philipp von Mengersen. Cohesive Text Generation from Multi-Document Summarization (supervised by Shahbaz Syed)
    • Nicolas Handke. What's your Point? Identifying Values in Arguments (supervised by Johannes Kiesel)
    • Yiwen Cao. Mapping travel routes based on travelogue narrative (supervised by Andreas Niekler and Magdalena Wolska)
    • Jonas Richter. Knowledge Graph of resistance network in Nazi Germany (supervised by Andreas Niekler and Christian Kahmann)
    • Clemens Schöne. Aquiring corpora with triggering content (supervised by Andreas Niekler and Magdalena Wolska)
    • Roy Rodney. A Web-based Implementation of the Netspeak Wordgraph (supervised by Tariq Youssef)
    • Hannes Winkler. Digital Monitor of Saxony (supervised by Andreas Niekler)
    • Markus Kobold. Etymological data from Wiktionary as a graph (supervised by Thomas Efer)
    • Jakob Schwerter. Audio- and text-based Podcast Retrieval and Summarization (supervised by Marcel Gohsen, Johannes Kiesel, and Shahbaz Syed)
    • Wolfgang Kircheis. Analyzing the History Section of Wikipedia Articles. (supervised by Martin Potthast)
    • Johannes Bräuer. Definition generation (supervised by Christopher Schröder)
    • Robby Wagner. The Impact of Main Content Extraction on Near-Duplicate Detection (supervised by Maik Fröbe, Christopher Schröder)
    • Ole Borchardt. Language Models for the Correction of OCR-Errors in Historic Documents (supervised by Tim Gollub and Janek Bevendorff)
    • Justus Stahlhut. Tracing Innovations on Wikipedia (supervised by Wolfgang Kircheis)
    • Yannick Dannies. Investigating Stopping Criteria for Active Learning (supervised by Christopher Schröder)
    • Maximus Germer. Chess Report Generation with Data-to-text (supervised by Janos Borst and Andreas Niekler)
    • Robin Bergewski. Modelling Influence Networks on Twitter for Author Profiling (supervised by Matti Wiegmann)
    • Ferdinand Lange. Detecting Text Reuse from Books (supervised by Lukas Gienapp)
    • Cariem El Wakil. Training a TTS Model with custom speech data for Galileofication. (supervised by Andreas Niekler)
    • Mathias Halbauer. Bitter Medicine - Measuring the sentiment shift in german tweets during different corona countermeasures. (supervised by Matti Wiegmann)
    • Bernhard Jung. Early Hype Detection - Detecting and Tracking Investment Hypes on Reddit. (supervised by Matti Wiegmann, Erik Körner, and Michael Völske)
  • Weimar
    • Saif Khan. Learning to Tag Environmental Sounds in Nightlong Audio (supervised by Jens Kersten and Johannes Kiesel)
    • Oliver Singler. Quantifying evidence of poetry perception based on physiological response to recital (supervised by Jan Ehlers and Magdalena Wolska)
    • Sebastian Laverde. Disentangling Aspects from Text Representations (superivsed by Tim Gollub).
    • Mujtaba Ahmed Abbasi. Exploratory Analysis of Wikipedia Text Reuse. (supervised by Michael Völske)
    • Vishal Khanna. Identifying Debating Strategies in Persuasive Discussions (supervised by Khalid Al-Khatib and Matti Wiegmann)
    • Thang Dung Nguyen. Learning to Paraphrase from Multi-Document Summarization Data (supervised by Shahbaz Syed and Michael Völske)
    • Hans Lienhop. Rapid Prototyping for the Digital Humanities (supervised by Tim Gollub)
    • Bibek Khadayat. Text Quality in Search-Supported Writing Tasks (supervised by Michael Völske and Magdalena Wolska)


  • First steps:
    1. Ask a staff member to onboard you.
    2. Staff only: Setup a CVS client (e.g. Eclipse + CVS plugin, ensure Latin1 encoding). Checkout all modules. Ask a staff member to explain the organizational structure (especially our main files) to you.
    3. Install the Webis Command on your machine. Among others, you can use this command to checkout the Webis code base.
    4. Mount the CEPH on your machine. Obtain access to the webis VPN for remote access.
    5. Ensure that your video chat setup works and that you can reach other staff members by the press of a button.

  • Meeting rules:
    • Please take notes.
    • Also students should prepare an agenda (not only the PhD).
    • Typical meeting duration: 30 minutes with student assistants, 60 minutes for projects and PhD meetings.
    • Don't stray off-topic. Respect your students' and PhDs' time respectively.
    • Don't meet if there is nothing to discuss.

  • Background: