Logo WIQE, WIQE2010 Ciudad de las Artes y de las Ciencas, Images from Valencia

PhD Colloqium on Web Information and Quality Evaluation

Escuela Técnica Superior de Ingeniería Informática
Universidad Politécnica de Valencia

13-15 September 2010


Monday, September 13th, 2010

09:00-09:45Detection of Text Plagiarism and Web Vandalism
Benno Stein [slides]
09:45-10:30Using Web n-grams to Help Second-Language Speakers
Martin Potthast [slides]
10:30-11:15Detection of Cross-language Text Reuse
Alberto Barrón-Cedeño [slides]
11:45-13:00Keynote - Entropy and Semantic: A Mathematical Approach to Authorship Attribution, Plagiarism Detection and Key Words Extraction
Mirko Degli Esposti [slides]
13:00-13:45Paraphrasing: Potential Applications for Plagiarism Detection
Marta Vila [slides]
13.45-14:15Wikipedia Aandalism: A First Attempt
Santiago Mola
16:00-16:45The Impact of Toponym Disambiguation in Geographical Information Retrieval and Question Answering
Davide Buscaldi
16:45-17:30Making the most of a Web Search Session
Matthias Hagen [slides]
18:00-19:00Discussion - Web Information and Quality Evaluation: On the Detection of Text Reuse, Plagiarism, Paraphrasing, and Wikipedia Vandalism

Tuesday, September 14th, 2010

09:30-10:45Keynote - Visual Analysis of Unstructured Data Sets
Michael Granitzer [slides]
10:45-11:30Automatic Detection of Information Quality Flaws in Wikipedia Articles
Maik Anderka [slides]
12:00-12:45On Filtering the Web
Nedim Lipka [slides]
12:45-13:30Networks, Crowds, and Markets: Reasoning about a Highly Connected Worl
Tim Gollub [slides]
13:30-14:15A General Bio-inspired Method to Improve the Short-text Clustering tas
Diego Ingaramo, Marcelo Errecalde, Paolo Rosso [slides]
16:00-16:45Insight into Cluster Labeling
Dennis Hoppe [slides]
16:45-17:15A Semantic Role Labeling Application
Lidia Moreno, Natividad Prieto
17:15-18:00Feature Associations in Graph Structures for Unsupervised Entity Disambiguation
Roman Kern [slides]
18:00-18:30Drug-Drug Interaction Detection: A New Approach Based on Maximal Frequent Sequence
Sandra García-Blasco [slides]
19:00-20:00Discussion - Web Information and Quality Evaluation: On Clustering and Labelling Information

Wednesday, September 15th, 2010

09:00-09:45Cross-language Text Classifcation using Structural Correspondence Learning
Peter Prettenhofer [slides]
09:45-10:30Figurative Language Processing: Mining Underlying Knowledge from Social Media
Antonio Reyes, Paolo Rosso [slides]
10:30-11:15Assessing Information Quality Facets in Blogs and Web Pages
Elisabeth Lex [slides]
11:15-11:45Opinion Sharing via Ontology Matching
Enrique Vallés
11:45 – 12:15Coffee-break
12:15 – 13:15Discussion - Web Information and Quality Evaluation: On the Classification of Objective and Subjective Information.


WIQE motivation starts from the observation, that today's information and data pools on the Web focus on the quantity of information rather than its quality; a fact observable through the increasing size of the blogosphere, the number of growing artificially created data, the well established copy & paste syndrome and the lack of semantically enriched data. Intentional and unintentional information misuse like for example Wikipedia vandalism, Spam Blogs (Splogs), Plagiarism etc. further adds to the decrease in information quality on the Web.

The resulting decentralized, low quality of information yields to several problems:

  • Information search requires robust methods for removing low quality, non-credible information.
  • Judging quality, credibility, and reliability of information remains a manual, labour intensive task.
  • Users can hardly estimate credibility of virtual persons to establish trusted relationships.
  • Separating contradicting facts or outdated information from valuable information assets becomes a major chalenge for information systems on the Web.
  • Information is redundantly stored and highly scatered among diferent places.

Overal, the Web today lacks quality dependent filter mechanisms, automatic identification of misuse paterns, as well as tools to establish user trust in information and authors. The aim of the workshop is to meet emerging challenges in our information-flooded society conducting both basic and applied research in the areas of information retrieval, datamining, and knowledge processing. The talks will cover a diverse set of research topics in the respective fields including document clustering, algorithmic approaches to information quality, plagiarism and text reuse, query formulation, and domain adaptation in natural language processing. The focus will be especialy on assessing information quality in the Web: the assessment of the quality of information is an important task because decisions are often based on information from multiple and sometimes unknown sources, though, the reliability and accuracy of the information is questionable.

Organizing Committee

Paolo Rosso
Alberto Barrón-Cedeño
Natural Language Engineering Lab. - ELiRF
Universidad Politécnica de Valencia, Spain

Benno Stein
Web Technology and Information Systems Group
Bauhaus-Universität Weimar, Germany