The Webis Search Mission Corpus 2012 (Webis-SMC-12) contains 8840 search engine interactions of 127 users. Two human annotators divided these interactions into 2881 logical sessions and 1378 missions. Cases where the annotators did not agree initially were discussed to reach a consensus.
You can access the Webis-SMC-12 corpus on Zenodo.
If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to cite the corpus via [bib]. If you additionally want to link to the dataset, please use the dataset's [doi] for a stable link.
To reproduce the results in our OAIR 2013 paper on search session and mission detection you can also download the feature values for Steps 3 to 6 of our cascading method (Steps 1 and 2 rely on time or query string that are both contained in the Webis-SMC-12):
(2.6 MB, MD5 sum: 21b7e111857f65cfc8e9237518a5e195)
The Webis-SMC-12 is based on the AOL log query sample used in Daniel Gayo-Avello's search session corpus (A survey on session detection methods in query logs and a proposal for future evaluation. Information Sciences, 179(12):1822–1843, 2009.). From the AOL log we extracted all queries of the 215 users contained in the Gayo-Avello sample. We removed the few queries that are empty or just a URL (probably submitted by users mixing up the search field with the address bar) and all queries from the 88 users that submitted less than 4 queries in total (too few queries for reasonable logical sessions).
The corpus contains a line for each interaction of a user. The interactions are ordered by time. Each line contains a user ID, the query string, a time stamp, and the rank and domain of the clicked result in case of a click interaction. This data comes from the AOL query log. Furthermore, we have annotated each interaction with a mission ID and potentially a comment from our annotators. The values in a line are tab-separated. Empty lines denote physical session breaks (more than 90 minutes between consecutive interactions) or logical session breaks (different search intents in one physical session). The interactions of different users are additionally split by a dashed line.
The feature value data for our OAIR 2013 paper is provided for all query pairs (or session-query pairs) in the Webis-SMC-12 on which our cascading method invokes the specific steps for session or mission detection. There are different folders for session and mission detection and for the individual steps. Note that the mission detection test was invoked on the manually labeled sessions in the Webis-SMC-12 test set. The pdf of our OAIR 2013 paper is included for convenience. The notes file contains the user ID's used for training and test set.
Students: Jakob Gomoll, Anna Beyer