Crawled over 2 weeks in January 2014, the Webis TripAdvisor Corpus 2014 (Webis-Tripad-14) consists of 266 061 reviews on 12 044 hotels by 208 785 users. Additionally, there is meta data about the hotels (such as location or overall ratings), the users (such as gender and age range) and the reviews itself (such as date posted and rating) available. We offer a download in json format: one file per hotel and one file containing all the user information.


You can access the Webis-Tripad-14 corpus on Zenodo.

If you use the dataset in your research, please send us a copy of your publication. If you additionally want to link to the dataset, please use the dataset's [doi] for a stable link.


The Webis TripAdvisor Corpus 2014 (Webis-Tripad-14) is designed in such a way that several different tasks can be performed on it, such as sentiment analysis, author profiling or usefulness detection.

The json-corpus consists of 12 045 files, where one of them contains all the user data and the others are one for each of the hotels in the data set. A detailed description of the data and the key/value pairs can be found as a README.txt in the download folder.

Useful statistics are shown in the tables below.

Attribute Number of Entries
Total users 208785
Female 48229
Male 41163
No gender given 119393
13-17 years old Total users 108
Female 46
Male 60
No gender given 2
18-24 years old Total users 2220
Female 1478
Male 710
No gender given 32
25-34 years old Total users 22286
Female 13494
Male 8356
No gender given 436
35-49 years old Total users 37610
Female 19483
Male 17392
No gender given 735
50-64 years old Total users 24595
Female 12317
Male 11861
No gender given 417
65+ Total users 4248
Female 1411
Male 2784
No gender given 53
No age given 117718
User Statistics


Attribute Value
Total hotels 12044
Reviews per hotel Min 1
Max 726
Average 22.09
Median 20
Hotel Statistics


Attribute Value
Total reviews 266061
Words per review Min 0
Max 2526
Average 67.82
Median 49
Review Statistics


Attribute Value
Reviews with helpfulness votes 154291
Reviews without helpfulness votes 111770
Votes per review with votes Min 1
Max 859
Average 3.88
Median 2
Statistics about Helpfulness Votes


Students: Katharina Spiel