TL;DR Challenge 2021: Abstractive Snippet Generation for Web Pages

Synopsis
Task
Data
Evaluation
Submission
Related Work
Task Committee

Synopsis

Task: Given a web page with a query, generate its abstractive snippet.
Input: [data]
Submission: [submit]
Register [here]

Task

We propose a shared task on abstractive snippet generation for web pages, a novel task of generating query-biased abstractive summaries for documents that are to be shown on a search results page. Conventional snippets are extractive in nature, which recently gave rise to copyright claims from news publishers as well as a new copyright legislation being passed in the European Union, limiting the fair use of web page contents for snippets. At the same time, abstractive summarization has matured considerably in recent years, potentially allowing for more personalization of snippets in the future. Taken together, these facts render further research into generating abstractive snippets both timely and promising.

Data

The Webis Abstractive Snippet Corpus has been mined from the ClueWeb09, the ClueWeb12, and the DMOZ Open Directory Project, extracting more than 3.5 million examples of the form (query, document, snippet). Participants are free to split the data into training and validation sets accordingly.

Evaluation

Our evaluation is based on that of Chen et al. (2020) as a two-step process involving intrinsic and extrinsic evaluation. The intrinsic evaluation assesses multiple properties of a snippet: text reuse, faithfulness (no hallucinations), and fluency. Extrinsic evaluation assesses their adequacy in the context of being used within a search engine. A combination of relevant automatic metrics and manual evaluation are used in both the scenarios. You will be able to self-evaluate your software using the TIRA service. You can find the user guide here. While results of the automatic metrics are shared on the leaderboard throughout the duration of the task, those of the manual evaluation will be shared later. This is primarily due to the cost constraint of selecting only the top-performing models on the automatic metrics for the subsequent manual evaluation via crowdsourcing.