PhD Colloqium on Web Information and Quality Evaluation

Escuela Técnica Superior de Ingeniería Informática
Universidad Politécnica de Valencia

13-15 September 2010

Program

Monday, September 13th, 2010

09:00-09:45	Detection of Text Plagiarism and Web Vandalism Benno Stein [slides]
09:45-10:30	Using Web n-grams to Help Second-Language Speakers Martin Potthast [slides]
10:30-11:15	Detection of Cross-language Text Reuse Alberto Barrón-Cedeño [slides]
11:15-11:45	Coffee-break
11:45-13:00	Keynote - Entropy and Semantic: A Mathematical Approach to Authorship Attribution, Plagiarism Detection and Key Words Extraction Mirko Degli Esposti [slides]
13:00-13:45	Paraphrasing: Potential Applications for Plagiarism Detection Marta Vila [slides]
13.45-14:15	Wikipedia Aandalism: A First Attempt Santiago Mola
14:15-16:00	Lunch
16:00-16:45	The Impact of Toponym Disambiguation in Geographical Information Retrieval and Question Answering Davide Buscaldi
16:45-17:30	Making the most of a Web Search Session Matthias Hagen [slides]
17:30-18:00	Coffee-break
18:00-19:00	Discussion - Web Information and Quality Evaluation: On the Detection of Text Reuse, Plagiarism, Paraphrasing, and Wikipedia Vandalism

Tuesday, September 14th, 2010

09:30-10:45	Keynote - Visual Analysis of Unstructured Data Sets Michael Granitzer [slides]
10:45-11:30	Automatic Detection of Information Quality Flaws in Wikipedia Articles Maik Anderka [slides]
11:30-12:00	Coffee-break
12:00-12:45	On Filtering the Web Nedim Lipka [slides]
12:45-13:30	Networks, Crowds, and Markets: Reasoning about a Highly Connected Worl Tim Gollub [slides]
13:30-14:15	A General Bio-inspired Method to Improve the Short-text Clustering tas Diego Ingaramo, Marcelo Errecalde, Paolo Rosso [slides]
14:15-16:00	Lunch
16:00-16:45	Insight into Cluster Labeling Dennis Hoppe [slides]
16:45-17:15	A Semantic Role Labeling Application Lidia Moreno, Natividad Prieto
17:15-18:00	Feature Associations in Graph Structures for Unsupervised Entity Disambiguation Roman Kern [slides]
18:00-18:30	Drug-Drug Interaction Detection: A New Approach Based on Maximal Frequent Sequence Sandra García-Blasco [slides]
18:30-19:00	Coffee-break
19:00-20:00	Discussion - Web Information and Quality Evaluation: On Clustering and Labelling Information

Wednesday, September 15th, 2010

09:00-09:45	Cross-language Text Classifcation using Structural Correspondence Learning Peter Prettenhofer [slides]
09:45-10:30	Figurative Language Processing: Mining Underlying Knowledge from Social Media Antonio Reyes, Paolo Rosso [slides]
10:30-11:15	Assessing Information Quality Facets in Blogs and Web Pages Elisabeth Lex [slides]
11:15-11:45	Opinion Sharing via Ontology Matching Enrique Vallés
11:45 – 12:15	Coffee-break
12:15 – 13:15	Discussion - Web Information and Quality Evaluation: On the Classification of Objective and Subjective Information.

Mission

WIQE motivation starts from the observation, that today's information and data pools on the Web focus on the quantity of information rather than its quality; a fact observable through the increasing size of the blogosphere, the number of growing artificially created data, the well established copy & paste syndrome and the lack of semantically enriched data. Intentional and unintentional information misuse like for example Wikipedia vandalism, Spam Blogs (Splogs), Plagiarism etc. further adds to the decrease in information quality on the Web.

The resulting decentralized, low quality of information yields to several problems:

Information search requires robust methods for removing low quality, non-credible information.
Judging quality, credibility, and reliability of information remains a manual, labour intensive task.
Users can hardly estimate credibility of virtual persons to establish trusted relationships.
Separating contradicting facts or outdated information from valuable information assets becomes a major chalenge for information systems on the Web.
Information is redundantly stored and highly scatered among diferent places.

Overal, the Web today lacks quality dependent filter mechanisms, automatic identification of misuse paterns, as well as tools to establish user trust in information and authors. The aim of the workshop is to meet emerging challenges in our information-flooded society conducting both basic and applied research in the areas of information retrieval, datamining, and knowledge processing. The talks will cover a diverse set of research topics in the respective fields including document clustering, algorithmic approaches to information quality, plagiarism and text reuse, query formulation, and domain adaptation in natural language processing. The focus will be especialy on assessing information quality in the Web: the assessment of the quality of information is an important task because decisions are often based on information from multiple and sometimes unknown sources, though, the reliability and accuracy of the information is questionable.

Organizing Committee

Paolo Rosso
Alberto Barrón-Cedeño
Natural Language Engineering Lab. - ELiRF
Universidad Politécnica de Valencia, Spain
http://users.dsic.upv.es/grupos/nle

Benno Stein
Web Technology and Information Systems Group
Bauhaus-Universität Weimar, Germany
http://www.webis.de

WIQE-10