Dear prospective PhD student, unsolicited applications to the Webis group ( are welcome. However, we cannot promise that open positions are available at the time of your application.

The Webis group is a tightly cooperating research network, formed by computer science chairs at the universities of Halle, Leipzig, Paderborn, and Weimar. Our mission is to tackle challenges of the information society by conducting basic and applied research with the goal of prototyping and evaluating future information systems. We are an experienced research group where team spirit and active collaboration has top priority. We are looking for open-minded graduates and PhDs who want to develop both as a researcher and as a person. The working language of our group is English; fluency in German is not required.

Interested students should have finished either a master or a PhD in computer science, mathematics, or a related field with excellent or very good grades. A solid background in mathematics and statistics is expected—as well as very good programming skills.

Benno Stein
Bauhaus-Universität Weimar
On behalf of the Webis group


Open Thesis Topics

Students who are eager to develop their skills by doing a research-oriented thesis in our group should mail their interests to Suitable topic candidates are shown in the following list, which is not meant to be complete though:

  • Adversarial Learning of Writing Style Representations
  • An Exploratory Study on the Retrievability of Research Papers
  • Argumentation in Software Engineering
  • Automatic Updates of Relevance Labels for Web Search
  • Contextualized Term Weighting for Total Recall
  • Improving Causal Relation Extraction from the Web
  • Search Technology for the Analysis of Social Discourses
  • Separating Social Spheres on Wikipedia
  • Simulating Search Behavior
  • System Evaluation with Pairwise Relevance Judgements

Ongoing Theses

  • Halle
    • Maximilian Probst. Anchor Texts for Ranking of Web-Documents (supervised by Sebastian Günther, Maik Fröbe, and Matthias Hagen)
    • Ekaterina Schirschakova. Clarifying the Objects and Aspects to Answer Comparative Questions (supervised by Alexander Bondarenko and Matthias Hagen)
    • Artur Jurk. Clickbait Spoiling (supervised by Matthias Hagen and Martin Potthast)
    • Daniel Wächtler. Learning to Rank with Distant Supervision (supervised by Maik Fröbe and Matthias Hagen)
    • Johannes Huck. Relevance Feedback with Keyqueries (supervised by Maik Fröbe, Sebastian Günther, and Matthias Hagen)
    • Paul Alexander Cahn. Re-Ranking for Total Recall in Systematic Reviews (supervised by Maik Fröbe and Matthias Hagen)
  • Leipzig
    • Eric Schmidt. Identifying Debating Strategies on Wikipedia (supervised by Khalid Al-Khatib)
    • Lukas Göhlich. Detecting Bias in Media (supervised by Khalid Al-Khatib, and Shahbaz Syed)
    • Lucy Betke. Detecting Bias in Summarization (supervised by Khalid Al-Khatib, and Shahbaz Syed)
    • Philipp von Mengersen. Cohesive Text Generation from Multi-Document Summarization (supervised by Shahbaz Syed)
    • Nicolas Handke. What's your Point? Identifying Values in Arguments (supervised by Johannes Kiesel)
    • Yiwen Cao. Mapping travel routes based on travelogue narrative (supervised by Andreas Niekler, Martin Potthast, Magdalena Wolska)
    • Clemens. Aquiring corpora with triggering content (supervised by Andreas Niekler, Martin Potthast, Magdalena Wolska)
    • Roy Rodney. A Web-based Implementation of the Netspeak Wordgraph (supervised by Martin Potthast)
    • Jakob Schwerter. Aspect-based Podcast Retrieval and Summarization (supervised by Marcel Gohsen, Johannes Kiesel, and Shahbaz Syed)
    • David Reinartz. Authorship Identification with Phonological Features (supervised by Janek Bevendorff, Magdalena Wolska, Martin Potthast, and Benno Stein)
    • Philipp Sauer. Enabling Authorship Analysis in Scientific Documents. (supervised by Janek Bevendorff, Lukas Gienapp, Wolfgang Kircheis, and Martin Potthast)
    • Wolfgang Kircheis. Crowdsourcing the Translation of a Book (supervised by Martin Potthast, Magdalena Wolska, and Lukas Gienapp)
    • Alexander Vopel. Dealing with False Memories in Web Search (supervised by Maik Fröbe, Martin Potthast, and Matthias Hagen)
    • Johannes Bräuer. Definition generation (supervised by Christopher Schröder and Martin Potthast)
    • Ole Borchardt. Language Models for the Correction of OCR-Errors in Historic Documents (supervised by Tim Gollub and Janek Bevendorff)
    • Justus Stahlhut. Tracing Innovations on Wikipedia (supervised by Martin Potthast and Wolfgang Kircheis)
    • Yannick Dannies. Reinforcement Learning for Active Learning (supervised by Christopher Schröder and Martin Potthast)
    • Robin Bergewski. Modelling Influence Networks on Twitter for Author Profiling (supervised by Matti Wiegmann)
    • Ferdinand Lange. Detecting Text Reuse from Books (supervised by Lukas Gienapp and Martin Potthast)
    • Cariem El Wakil. Training a TTS Model with custom speech data for Galileofication. (supervised by Andreas Niekler, Khalid Al-Khatib and Martin Potthast)
  • Weimar
    • Oliver Singler. Quantifying evidence of poerty perception based on physiological response to recital (supervised by Jan Ehlers and Magdalena Wolska)
    • Ademola Eric Adewumi. Detecting Web Page Functions (supervised by Johannes Kiesel)
    • Mujtaba Ahmed Abbasi. Exploratory Analysis of Wikipedia Text Reuse. (supervised by Michael Völske and Martin Potthast)
    • Nicola Libera. Gamification in Information Retrieval (supervised by Maik Fröbe and Vaibhav Kasturia)
    • Vishal Khanna. Identifying Debating Strategies in Persuasive Discussions (supervised by Khalid Al-Khatib and Matti Wiegmann)
    • Siva Bathala. Incident Linking: Assigning Tweets to Entries in a Disaster Database (supervised by Matti Wiegmann)
    • Thang Dung Nguyen. Learning to Paraphrase from Multi-Document Summarization Data (supervised by Shahbaz Syed and Michael Völske)
    • Lucky Chandrautama. Linking Argumentative Concepts in Argumentation Graphs (supervised by Khalid Al-Khatib)
    • Hans Lienhop. Rapid Prototyping for the Digital Humanities (supervised by Tim Gollub)
    • Vincent Söllner. Requirements engineering for natural-language annotation tasks (supervised by Janek Bevendorff, Magdalena Wolska, and Martin Potthast)
    • Dipendra Sharma Kafle. Style-based Analysis of Persuasive Strategies (supervised by Khalid Al-Khatib and Roxanne El-Baff)
    • Bibek Khadayat. Text Quality in Search-Supported Writing Tasks (supervised by Michael Völske and Magdalena Wolska)


  • First steps:
    1. Staff only: Obtain a user account on our CVS server. Setup a CVS client (e.g. Eclipse + CVS plugin, ensure Latin1 encoding). Checkout all modules. Have a staff member explain the organizational structure to you - they will be happy to do so.
    2. Obtain a user account on our GitLab. Install the Webis Command on your machine. Among others, you can use this command to checkout the Webis code base.
    3. Mount the CEPH data repository on your machine. Obtain access to our VPN for remote access.
    4. Subscribe to the Webis communication methods listed below. Please ensure that your video chat setup works and that you can reach other staff members by the press of a button.

  • Communication:
    • Discord: ask staff for access
    • Google Calendar
    • Mailing list staff:
    • Mailing list students:
    • Skype: webis
    • Twitter: @webis_de

  • Meeting rules:
    • Please take notes.
    • Also students should prepare an agenda (not only the PhD).
    • Typical meeting duration: 30 minutes with student assistants, 60 minutes for projects and PhD meetings.
    • Don't stray off-topic. Respect your students' and PhDs' time respectively.
    • Don't meet if there is nothing to discuss.

  • Background: