DiveR

Preference and Diversity-based Ranking in Network-Centric Information Management Systems

Diversity in Keyword Search

Keyword-based search is very popular because it allows users to express their information needs without either being aware of the underlying structure of the data or using a query language. In relational databases, existing keyword search approaches exploit the database schema or the given database instance to retrieve tuples relevant to the keywords of the query. Keyword search is intrinsically ambiguous. Given the abundance of available information, exploring the contents of a database is a complex procedure that may return a huge volume of data. Still, users would like to retrieve only a small piece of it, namely the most relevant to their interests.

In our model, preferences express a user choice that holds under a specific context, where both context and choice are specified through keywords. For example, consider the following two preferences: ({thriller}, G. Oldman > W. Allen) and ({comedy}, W. Allen > G. Oldman). The first preference denotes that in the context of thriller movies, the user prefers G. Oldman over W. Allen, whereas the latter, that in the context of comedies, the user prefers W. Allen over G. Oldman. Such preferences may be specified in an ad-hoc manner when the user submits a query or they may be stored in a general user profile.

Given a set of preferences, we would like to personalize a keyword query Q by ranking its results in an order compatible with the order expressed in the user choices for context Q. For example, in the results of the query Q = {thriller}, movies related to G. Oldman should precede those related to W. Allen. To formalize this requirement, we consider expansions of query Q with the set of keywords appearing in the user choices for context Q. For instance, for the query Q = {thriller}, we use the queries Q1 = {thriller, G. Oldman} and Q2 = {thriller, W. Allen}. We project the order induced by the user choices among the results of these queries to produce an order among the results of the original query Q.

User preferences

Example of user preferences.

Since keyword search is often best-effort, given a constraint k on the number of results, we would like to combine the order of results as indicated by the user preferences with their relevance to the query. Besides preferences and relevance, we also consider the set of the k results as a whole and seek to increase the overall value of this set to the users. Specifically, we aim at selecting the k most representative among the relevant and preferred results, i.e., these results that both cover different preferences and have different content. In general, such result diversification, i.e. selecting items that differ from each other, has been shown to increase user satisfaction.

Related Publications: