Research issues in context-aware retrieval: deriving the query

Peter Brown

Department of Computer Science, University of Exeter, Exeter EX4 $QF, UK
P.J.Brown@ex.ac.uk

ABSTRACT

In many places we talk about "deriving a query from a document". This paper tries to spell out the full meaning of the expression. It also outlines a feature, present in the Context Matcher, for fine-tuning the active fields from which a query is derived.

Introduction

Context-aware retrieval involves matching each document in the document collection against the current context (which we assume is also represented as a document of a similar form to the documents in the collection). All these documents are divided into fields. In its simplest form, retrieval works as follows:

Certain fields are marked as active; often this is done globally by the field name, e.g. "all Time fields are to be active". See field-specification document for further information.
Matching is driven either by the current context (interactive case) or by the document from the collection (proactive case). We thus have a query document that drives the retrieval, and a target document. The active fields in the query document drive the matching process. If a query document contains no active fields (as may well happen in the proactive case if the data is inhomogeneous) then no matching activity occurs.
The active fields are compared, pair by pair, between the two documents to be matched. If a particular field does not exist in the target document (e.g. in the proactive case we are trying to match Location with the current context as target, but the current context does not have a Location -- perhaps because a sensor is not working), then, depending on whether the Context Matcher has a default of "presence-compulsory", this may be considered as a non-match. See field-specification document for further information.
If all the fields match, or, assuming our preferred best-match strategy is used, if the accumulated score from the field matches exceeds some threshold, the two documents are considered to match, and are delivered to the user.

Researchers in retrieval are used to talking in terms of explicit queries, rather than just the matching of fields (e.g. in terms of the query "does the date field match 1963-7?" rather than a comparison of a date field of 1963-7 in one document matching the date field of another). To follow this terminology, we can imagine that, between stages (2) and (3) above, a query is derived from the query document and applied to the target document. For example if Temperature and Orientation are the active fields, and if the query document has the fields:

  <Temperature> 32
  <Orientation> S

then the derived query is "Temperature is 32 and Orientation is S". Often numerical fields are ranges rather than single values, e.g. instead of 32 above, we might have the range 30..35. In this case the derived query would say "Temperature >= 30 and <= 35".

I, in common with many other researchers in the context-awareness field, have gone through this loop:

In the first prototype the active fields were explicitly marked in the documents, or alternatively the queries were not derived at all but explicitly associated with each document. To change the queries, the data needed to be changed.
This turned out to be too inflexible. In data that has a rich set of fields, there is a great advantage in choosing dynamically which fields are to be active, i.e. used to derive the query. Moreover the same document may sometimes act as the query document and sometimes the target document.
A more flexible scheme was introduced. This allows a dynamic setting of what the active fields are to be, and thus what the derived queries are.

The process of deriving the query

In our research we have assumed a simple mechanical rule for deriving the query from the values of active fields, as illustrated by the Temperature and Orientation example above. There is scope for investigating more sophisticated algorithms, but that is not our immediate priority. We are interested in history and prediction of the current context; our approach has, however, been to massage the values in the current context before the query is mechanically derived, rather than to alter the derivation process. For example if the temperature is rising and its current value is 10, we may massage the current context by increasing the temperature slightly to 11 (or by turning it into the range 10..12) thus hopefully anticipating the user's future needs.

An example

We assume there is a document collection that has been created by an author in order to provide proactive information for tourists. Each document has a Body field describing an attraction, and a set of four fields that is designed to match the user's current context and trigger if there is a good match. These four fields are Location, Orientation, Time and Temperature. Not all the documents have all these fields. For instance the only documents with a Temperature field are those that relate to outside attractions, and the only ones with an Orientation field are those that relate to views. By default all of these four fields might be specified as active; for an overall document to be active it must contain at least one of these fields; it might even contain all four. An implication of the retrieval process is that documents having an Orientation field (e.g. a document concerned with views) would only be triggered if this matched the user's orientation; if the user had no orientation sensor, and thus no Orientation field in their current context, then, depending on whether the Matcher had a default of "presence-compulsory", these documents may or may not be triggered. An application might choose -- perhaps as a result of a user's request -- to change the defaults on active fields. For example Location might be made the only active field. The result would be that more documents would be triggered, some of them perhaps irrelevant (e.g. an nearby open air restaurant on a cold day), but at least the user would have the information for future reference.

A completely different use of the document collection would be to use it as a target for interactive retrieval, perhaps as part of a pipeline also involving proactive retrieval. In this case the current context may contain a field, set by the user, of:

  <Body> Drake

and Body might be the only active field. This would then retrieve documents in the collection whose Body contained the word "Drake".

Some relevant papers

1.: The manual for the Context Matcher.
2.: Context-aware retrieval: exploring a new environment for information retrieval and information filtering . P.J. Brown, G.J.F. Jones. to be published in Personal Technologies, 2001.
3.: Specifying the fields to be matched.