Dear prospective PhD student, unsolicited applications to the Webis group ( are welcome. However, we cannot promise that open positions are available at the time of your application.

The Webis group is a tightly cooperating research network, formed by computer science chairs at the universities of Halle, Leipzig, Paderborn, and Weimar. Our mission is to tackle challenges of the information society by doing basic and applied research with the goal of prototyping and evaluating future information systems. We are an experienced research group where team spirit and active collaboration has top priority. We are looking for open-minded graduates and PhDs who want to develop both as a researcher and as a person. The working language of our group is English; fluency in German is not required.

Interested students should have finished either a master or a PhD in computer science, mathematics, or a related field with excellent or very good grades. A solid background in mathematics and statistics is expected—as well as very good programming skills.

Benno Stein
Bauhaus-Universität Weimar
On behalf of the Webis group


Open Thesis Topics

Students who are eager to develop their skills in a research-oriented thesis in our group should mail us their interests to Suitable topics are, for example:

  • Adversarial Learning of Writing Style Representations
  • Authorship Identification with Phonological Features
  • Axiomatic Argumentative Web Scale Document Re-ranking
  • Crowdsourcing Web Page Segment Labels
  • Detecting Bias in Media
  • Detecting Bias in Search Engines
  • Detecting Ideological Bias in Word Embeddings
  • Detecting Text Reuse from Books
  • Detecting Web Page Functions
  • Developing a Collaborative Writing Tool for Wikipedia
  • Disraptor – A Discourse Plugin to Separate Business Logic and User Management
  • Entity Linking for Comparative Questions
  • Exploiting Argumentation Knowledge Graphs for Argument Generation
  • Exploratory Analysis of Wikipedia Text Reuse
  • Exploring Argumentation Strategies in Persuasive Texts
  • Gamification in Information Retrieval
  • Harvesting the Web for Building Evidence-based Knowledge Graphs
  • Improving HiCAL's Ranking for Systematic Reviews
  • Learning to Rank Using Only Outdated Training Data
  • Neural Netspeak – Exploring the Performance of Transformer Models as Idiomatic Writing Assistants
  • Paraphrasing Operations for Heuristic Author Obfuscation
  • Re-Ranking for Total Recall in Systematic Reviews
  • Roles and Effects of Metainformation in Conversational Search
  • Semantic Search Engines for the Analysis of Debates and Discourses
  • Semi-Automatically Supporting Crowdsourcing Approval Processes
  • Simulating Search Behavior
  • Text Mining Methods for Intelligent Writing Assistance

Ongoing Theses

  • Nico Reichenbach. Argumentative Image Search (supervised by Johannes Kiesel, Martin Potthast, and Benno Stein)
  • Till Werner. Argument Quality Assessment in Natural Language using Machine Learning (supervised by Henning Wachsmuth)
  • Xiaoni Cai. Building Complex Queries in Conversational Search (supervised by Johannes Kiesel and Roxanne El-Baff)
  • Artur Jurk. Clickbait Spoiling (supervised by Matthias Hagen and Martin Potthast)
  • Counterargument Generation via Premise Rebuttal (supervised by Milad Alshomary)
  • Wolfgang Kircheis. Crowdsourcing the Translation of a Book (supervised by Martin Potthast, Magdalena Wolska, and Lukas Gienapp)
  • Alexander Vopel. Dealing with False Memories in Web Search (supervised by Maik Fröbe, Martin Potthast, and Matthias Hagen)
  • Johannes Bräuer. Definition generation (supervised by Christopher Schröder and Martin Potthast)
  • Alexander Rensch. Expertise Filtering for Social Media Timelines (supervised by Matthias Hagen and Martin Potthast)
  • Shaour Haider. Few Shot Learning for Text Classification (supervised by Tim Gollub and Magdalena Wolska)
  • Lukas Trautner. Graph-Based Synthesis of Counterfactuals (supervised by Khalid Al-Khatib and Benno Stein)
  • Lukas Gehrke. Gender Regression and Gender Prediction based on Writing Style (supervised by Matti Wiegmann and Martin Potthast)
  • Hannes Winkler. Geolocation of Social Media Posts (supervised by Matti Wiegmann, Martin Potthast, Magdalena Wolska, and Benno Stein)
  • Anh Phuong Le. Harvesting the Web for Building Large-scale Argumentation Graphs (supervised by Khalid Al-Khatib)
  • Daniel Wächtler. Learning to Rank with Distant Supervision (supervised by Maik Fröbe and Matthias Hagen)
  • Fan Fan. Mining High-ethos Evidence from Wikipedia (supervised by Khalid Al-Khatib and Yamen Ajjour)
  • Jan Philipp Bittner. Near-Duplicate-Detection of Webpages (supervised by Maik Fröbe and Matthias Hagen)
  • Fatema Merchant. Neural Paraphrasing Methods for Augmented Writing Tools (supervised by Khalid Al-Khatib and Shahbaz Syed)
  • Johanna Sacher. Paraphrasing Texts for Conversational News (supervised by Matthias Hagen and Johannes Kiesel)
  • Jan Heinrich Reimer. Popularity Bias in Learning-to-Rank (supervised by Maik Fröbe and Matthias Hagen)
  • Salomo Pflugradt. Reproducing Text Alignment Algorithms from PAN (supervised by Shahbaz Syed)
  • Saeed Entezari. Shared Task on Argument Retrieval (supervised by Michael Völske)
  • Nick Düsterhus. Snippet Generation for Argument Search (supervised by Milad Alshomary)
  • Niklas Homann. Stance Classification for Answering Comparative Questions (supervised by Alexander Bondarenko and Matthias Hagen)
  • The Said and the Unsaid: Analyzing Metaphors using Word Embeddings (supervised by Henning Wachsmuth)
  • Valentin Dittmar. Towards Answering Comparative Web Questions (supervised by Alexander Bondarenko and Matthias Hagen)
  • Unsupervised Metaphor Categorization (supervised by Henning Wachsmuth)
  • Prem Kumar Tiwari. Facet Completion based on Term Embeddings (supervised by Tim Gollub and Anne Peter)
  • Theresa Elstner. What's missing? Visual Differences in Screenshots of Archived Web Pages (supervised by Lars Meyer and Johannes Kiesel)


  • First steps:
    1. Obtain a user account on our CVS server. Setup a CVS client (e.g., Eclipse + CVS plugin, ensure Latin1 encoding). Checkout all modules. Have a staff member to explain the organizational structure to you: they will be happy to do so.
    2. Obtain a user account on our GitLab. Install the Webis Command on your machine. Among others, you can use this command to checkout the Webis code base.
    3. Mount the CEPH data repository on your machine. Obtain access to our VPN for remote access.
    4. Subscribe to the Webis communication methods listed below. Please ensure that your video chat setup works and that you can reach other staff members by the press of a button.

  • Communication:
    • Discord: ask staff for access
    • Google Calendar
    • Mailing list staff:
    • Mailing list students:
    • Skype: webis
    • Twitter: @webis_de
    • DFNconf: webis room
    • Whereby: webis room

  • Meeting rules:
    • Please take notes.
    • Also students should prepare an agenda (not only the PhD).
    • Typical meeting duration: 30 minutes with student assistants, 60 minutes for projects and PhD meetings.
    • Don't stray off-topic. Respect your students' and PhDs' time respectively.
    • Don't meet if there is nothing to discuss.

  • Software-related practice: