Web Archive

Synopsis
People
Publications

Synopsis

The World Wide Web is the single largest repository of digital culture and knowledge. This project focuses on the analysis of this invaluable resource through an 8 PB dataset of both current and historical web content taken from the Internet Archive's web archive. Our ongoing and planned large scale data analyses will address selected scientific, social, and ethical challenges of the information society in general, and the web in particular.

As part of the project, we utilize large scale cluster infrastructure for both storage and processing (facilities.webis.de » Hardware), as well as virtual workspaces to explore that data. This project is funded by the German Federal Ministry of Education and Research (BMBF) as part of the Immersive Web Observatory project. Partners: Prof. M. Hagen (Friedrich Schiller University Jena), Jun.-Prof. M. Potthast (Leipzig University), Prof. B. Fröhlich, and Prof. B. Stein (Bauhaus-Universität Weimar). See the official announcement (in German) of the Bauhaus-Universität here. The download of the data is currently in progress.

We are interested in joint research and partnerships on this data. Please contact us for ways to get access. [awards] [illustration]

Web Archive

Synopsis

People

Publications

Args

ChatNoir

IR Anthology

Netspeak

Picapica

TIRA