Dear prospective PhD student, unsolicited applications to the Webis group ( are welcome. However, we cannot promise that open positions are available at the time of your application.

The Webis group is a tightly cooperating research network, formed by computer science chairs at the universities of Halle, Leipzig, Paderborn, and Weimar. Our mission is to tackle challenges of the information society by conducting basic and applied research with the goal of prototyping and evaluating future information systems. We are an experienced research group where team spirit and active collaboration has top priority. We are looking for open-minded graduates and PhDs who want to develop both as a researcher and as a person. The working language of our group is English; fluency in German is not required.

Interested students should have finished either a master or a PhD in computer science, mathematics, or a related field with excellent or very good grades. A solid background in mathematics and statistics is expected—as well as very good programming skills.

Benno Stein
Bauhaus-Universität Weimar
On behalf of the Webis group


Open Thesis Topics

Students who are eager to develop their skills by doing a research-oriented thesis in our group should mail their interests to Suitable topic candidates are shown in the following list, which is not meant to be complete though. Topics marked with "" are especially promoted from our side because of their scientific importance, impact, or originality:

  • Adversarial Learning of Writing Style Representations
  • Argumentation in Software Engineering
  • Automatic Generation of Persuasive Tweets
  • Axiomatic Argumentative Web Scale Document Re-ranking
  • Detecting Bias in Media
  • Detecting Ideological Bias in Word Embeddings
  • Detecting Text Reuse from Books
  • Disraptor—A Discourse Plugin to Separate Business Logic and User Management
  • Entity Linking for Comparative Questions
  • Harvesting the Web for Building Evidence-based Knowledge Graphs
  • Identifying Successful Debating Strategies in Social Media
  • Improving HiCAL's Ranking for Systematic Reviews
  • Incident Linking: Assigning Tweets to Entries in a Disaster Database
  • Information Theory and Authorship
  • Language Models for the Correction of OCR-Errors in Historic Documents
  • Learning to Rank Using Only Outdated Training Data
  • Modelling Influence Networks on Twitter for Author Profiling
  • Paraphrasing Operations for Heuristic Author Obfuscation
  • Probing Neural-based Models for Same-side Stance Classification
  • Roles and Effects of Metainformation in Conversational Search
  • Search Technology for the Analysis of Social Discourses
  • Separating Social Spheres on Wikipedia
  • Simulating Search Behavior
  • System Evaluation with Pairwise Relevance Judgements

Ongoing Theses

  • Maximilian Probst. Anchor Texts for Ranking of Web-Documents (supervised by Sebastian Günther, Maik Fröbe, and Matthias Hagen)
  • Nico Reichenbach. Argumentative Image Search (supervised by Johannes Kiesel, Martin Potthast, and Benno Stein)
  • Till Werner. Argument Quality Assessment in Natural Language using Machine Learning (supervised by Henning Wachsmuth)
  • Xiaoni Cai. Building Complex Queries in Conversational Search (supervised by Johannes Kiesel and Roxanne El-Baff)
  • Artur Jurk. Clickbait Spoiling (supervised by Matthias Hagen and Martin Potthast)
  • Counterargument Generation via Premise Rebuttal (supervised by Milad Alshomary)
  • Wolfgang Kircheis. Crowdsourcing the Translation of a Book (supervised by Martin Potthast, Magdalena Wolska, and Lukas Gienapp)
  • Alexander Vopel. Dealing with False Memories in Web Search (supervised by Maik Fröbe, Martin Potthast, and Matthias Hagen)
  • Johannes Bräuer. Definition generation (supervised by Christopher Schröder and Martin Potthast)
  • Valerie Lemuth. Detecting Bias in Search Engines (supervised by Yamen Ajjour and Martin Potthast)
  • Ademola Eric Adewumi. Detecting Web Page Functions (supervised by Johannes Kiesel)
  • Alexander Rensch. Expertise Filtering for Social Media Timelines (supervised by Matthias Hagen and Martin Potthast)
  • Mujtaba Ahmed Abbasi. Exploratory Analysis of Wikipedia Text Reuse. (supervised by Michael Völske and Martin Potthast)
  • Prem Kumar Tiwari. Facet Completion based on Term Embeddings (supervised by Tim Gollub and Anne Peter)
  • Shaour Haider. Few Shot Learning for Text Classification (supervised by Tim Gollub and Magdalena Wolska)
  • Nicola Libera. Gamification in Information Retrieval (supervised by Maik Fröbe and Vaibhav Kasturia)
  • Christian Dunkel. Generating Online Lectures (supervised by Martin Potthast and Lars Meyer)
  • Anh Phuong Le. Harvesting the Web for Building Large-scale Argumentation Graphs (supervised by Khalid Al-Khatib and Michael Völske)
  • Thang Dung Nguyen. Learning to Paraphrase from Multi-Document Summarization Data (supervised by Shahbaz Syed and Michael Völske)
  • Daniel Wächtler. Learning to Rank with Distant Supervision (supervised by Maik Fröbe and Matthias Hagen)
  • Lucky Chandrautama. Linking Argumentative Concepts in Argumentation Graphs (supervised by Khalid Al-Khatib and Vaibhav Kasturia)
  • Fabian Thies. Neural Netspeak – Exploring the Performance of Transformer Models as Idiomatic Writing Assistants (supervised by Matti Wiegmann, and Martin Potthast)
  • Fatema Merchant. Neural Paraphrasing Methods for Augmented Writing Tools (supervised by Khalid Al-Khatib and Shahbaz Syed)
  • Johanna Sacher. Paraphrasing Texts for Conversational News (supervised by Johannes Kiesel, Khalid Al-Khatib, and Matthias Hagen)
  • Johannes Huck. Relevance Feedback with Keyqueries (supervised by Maik Fröbe, Sebastian Günther, and Matthias Hagen)
  • Salomo Pflugradt. Reproducing Text Alignment Algorithms from PAN (supervised by Shahbaz Syed)
  • Vincent Söllner. Requirements engineering for natural-language annotation tasks (supervised by Janek Bevendorff, Magdalena Wolska, and Martin Potthast)
  • Paul Alexander Cahn. Re-Ranking for Total Recall in Systematic Reviews (supervised by Maik Fröbe and Matthias Hagen)
  • Nick Düsterhus. Snippet Generation for Argument Search (supervised by Milad Alshomary)
  • Niklas Homann. Stance Classification for Answering Comparative Questions (supervised by Alexander Bondarenko and Matthias Hagen)
  • Dipendra Sharma Kafle. Style-based Analysis of Persuasive Strategies (supervised by Khalid Al-Khatib and Roxanne El-Baff)
  • Bibek Khadayat. Text Quality in Search-Supported Writing Tasks (supervised by Michael Völske and Magdalena Wolska)
  • The Said and the Unsaid: Analyzing Metaphors using Word Embeddings (supervised by Henning Wachsmuth)
  • Justus Stahlhut. Tracing Innovations on Wikipedia (supervised by Martin Potthast and Wolfgang Kircheis)
  • Unsupervised Metaphor Categorization (supervised by Henning Wachsmuth)
  • Theresa Elstner. What's missing? Visual Differences in Screenshots of Archived Web Pages (supervised by Lars Meyer and Johannes Kiesel)


  • First steps:
    1. Staff only: Obtain a user account on our CVS server. Setup a CVS client (e.g. Eclipse + CVS plugin, ensure Latin1 encoding). Checkout all modules. Have a staff member explain the organizational structure to you - they will be happy to do so.
    2. Obtain a user account on our GitLab. Install the Webis Command on your machine. Among others, you can use this command to checkout the Webis code base.
    3. Mount the CEPH data repository on your machine. Obtain access to our VPN for remote access.
    4. Subscribe to the Webis communication methods listed below. Please ensure that your video chat setup works and that you can reach other staff members by the press of a button.

  • Communication:
    • Discord: ask staff for access
    • Google Calendar
    • Mailing list staff:
    • Mailing list students:
    • Skype: webis
    • Twitter: @webis_de
    • DFNconf: webis room
    • Whereby: webis room

  • Meeting rules:
    • Please take notes.
    • Also students should prepare an agenda (not only the PhD).
    • Typical meeting duration: 30 minutes with student assistants, 60 minutes for projects and PhD meetings.
    • Don't stray off-topic. Respect your students' and PhDs' time respectively.
    • Don't meet if there is nothing to discuss.

  • Software-related practice: