A robust approach for finding conceptually related queries using feature selection and tripartite graph structure
Journal of Information Science
Published online on March 18, 2013
Abstract
The information explosion on the Internet has placed high demands on search engines. Despite the improvements in search engine technology, the precision of current search engines is still unsatisfactory. Moreover, the queries submitted by users are short, ambiguous and imprecise. This leads to a number of problems in dealing with similar queries. The problems include lack of common keywords, selection of different documents by the search engine and lack of common clicks etc. These problems render the traditional query clustering methods unsuitable for query recommendations. In this paper, we propose a new query recommendation system. For this, we have identified conceptually related queries by capturing users’ preferences using click-through graphs of web search logs and by extracting the best features, relevant to the queries, from the snippets. The proposed system has an online feature extraction phase and an offline phase in which feature filtering and query clustering are performed. Query clustering is carried out by a new tripartite agglomerative clustering algorithm, Query-Document-Concept Clustering, in which the documents are used innovatively to decouple queries and features/concepts in a tripartite graph structure. This results in clusters of similar queries, associated clusters of documents and clusters of features. We model the query recommendation problem in four different ways. Two models are non-personalized and personalized content-ignorant models. Other two are non-personalized and personalized content-aware models. Three similarity measures are introduced to estimate different kinds of similarities. Experimental results show that the proposed approach has better precision, recall and F-measure than the existing approaches.