Synopsis

AIsearch offers a convenient interface for Web-based search and combines algorithms for the formation, labeling, and visualization of categories along with a smart spelling analysis. [video]

AIsearch was finalist in the European Academic Software Award (EASA) competition and received a special prize as research tool.

Research

Searching with AIsearch A search process with the AIsearch Web interface starts as usual: A query in the form of interesting search terms is entered within a dialog field. The query is sent to several search engines and for a syntactic analysis to a SmartSpell® server. The query results, i. e., the HTML document snippets, are collected and analyzed with respect to the similarity of their contents. Based on this analysis, adequate categories are formed and labeled, and a tree of the categories, which shows related categories at a closer distance than unrelated categories, is drawn in the hyperbolic plane. The following figure shows a snapshot of the AIsearch Web interface for the query "tea flavour". Aside from the hyperbolic category tree, the returned document snippets can also be browsed in a list format. The list groups all snippets of the same category together, and, immediate access to each sublist is possible by simply clicking the leafs in the category tree.

Query Analysis with SmartSpell® The terms of the query are checked with respect to both correct spelling and similar terms. For this job the powerful SmartSpell algorithm is used. SmartSpell analyzes spelling errors with regard to the editing distance, the Levinshtein distance, and the phonological distance against a dictionary. The phonological interpretation depends on a language's level of phonemicity and is realized with a sophisticated, phoneme-dependent word similarity measure. To efficiently find syntactic and phonetic similar words for a search term, SmartSpell operationalizes several paradigms of heuristic search: nogood-lemma generation, search space pruning based on over- and underestimation, iterative deepening search, and memorization. The following table shows some examples of misspelled words along with SmartSpell's proposals and similarity estimations.

Misspelled word SmartSpell® proposal (similarity)
aksekjushon execution (81%)
angenearing engineering (92%)
blu blue (93%), blew (92%)
buysikel physical (85%), bicycle (82%)
shoor shoal (88%), shoo (88%), sure (82%)

Examples for misspelled words (left column) and the SmartSpell proposals with similarity estimations (right column). SmartSpell's proposals of similar search terms are directly integrated in the query field; they enable the reformulation, extension, or correction of a query by the press of a button.

Category Formation AIsearch implements a new clustering algorithm (MajorClust) for the automatic categorization of document collections. Several analyses have shown the high quality of the found categories. To compare different clusterings of search results, AIsearch employs strategy patterns to make term weighting schemes, similarity measures, clustering algorithms, and cluster validity measures interchangeable at runtime. For efficient text handling, the symbol processing algorithms for text parsing, text compression, and text comparison utilize specialized flyweight patterns.

Software Architecture and Deployment When a user enters the AIsearch URL in his browser, a Java Applet that contains the AIsearch user interface is delivered from the Web server, which in turn communicates with the load balancing module. All requests from the client, such as a request for spelling or a request for search, are coded in a proprietary protocol that contains several commands. Whenever a command reaches the load balancing module, one of the AIsearch engines is chosen to perform the associated task. All commands are processed asynchronously. All computationally expensive tasks are performed as threads, which allows us to run several commands simultaneously on a single AIsearch engine. Moreover, the threading model supports multiprocessor machines ideally, and, combined with a load balancing concept, assures a simple scalability of the architecture.

People

Publications