In this project we develop OpinionCloud, a new opinion summarization technology for Web comments in general and YouTube and Flickr in particular. Popular Web items often get up to thousands of comments and in order to get an idea about the crowd's overall opinion one has to read all of them, which is of course impractical. Our summarization approach helps to retrieve this important piece of information by generating an opinion word cloud for a given set of comments. We operationalize the technology in a browser add-on for Firefox which summarizes the comments on a YouTube video when the user starts watching it.


Our research on opinion summarization of Web comments boils down to two research areas: sentiment analysis and summary visualization. The former deals with the classification of words as positive, negative, or neutral, whereas the latter deals with the design of an accessible visual representation of a set of opinions.

In sentiment analysis a word's polarity can be identified by measuring its co-occurrence with words whose polarity is known in advance, i.e., if a given word occurs with a high probability in the vicinity of positive (negative) words it can be considered positive (negative) as well. Neutral words, however, tend to occur arbitrarily next to words of both polarities. We use this idea to train a dictionary of opinion words which also contains slang terms that are often used in comments. The dictionary is then used to classify the words of comments into positive, negative, and neutral words. By default, words that are not contained in the dictionary are considered neutral.

The the visualization of the opinions, the words are arranged in a cloud where the color of a word denotes its polarity and the size of a word its frequency in the comments. This visualization is comparable to the well-known tag clouds for folksonomies.


Students: Steffen Becker