Peter Brown
Department of Computer Science, University of Exeter, Exeter EX4 $QF, UK
P.J.Brown@ex.ac.uk
ABSTRACT
In many places we talk about "deriving a query from a document". This paper tries to spell out the full meaning of the expression. It also outlines a feature, present in the Context Matcher, for fine-tuning the active fields from which a query is derived.
Context-aware retrieval involves matching each document in the document collection against the current context (which we assume is also represented as a document of a similar form to the documents in the collection). All these documents are divided into fields. In its simplest form, retrieval works as follows:
Researchers in retrieval are used to talking in terms of explicit queries, rather than just the matching of fields (e.g. in terms of the query "does the date field match 1963-7?" rather than a comparison of a date field of 1963-7 in one document matching the date field of another). To follow this terminology, we can imagine that, between stages (2) and (3) above, a query is derived from the query document and applied to the target document. For example if Temperature and Orientation are the active fields, and if the query document has the fields:
<Temperature> 32 <Orientation> S
then the derived query is "Temperature is 32 and Orientation is S". Often numerical fields are ranges rather than single values, e.g. instead of 32 above, we might have the range 30..35. In this case the derived query would say "Temperature >= 30 and <= 35".
I, in common with many other researchers in the context-awareness field, have gone through this loop:
In our research we have assumed a simple mechanical rule for deriving the query from the values of active fields, as illustrated by the Temperature and Orientation example above. There is scope for investigating more sophisticated algorithms, but that is not our immediate priority. We are interested in history and prediction of the current context; our approach has, however, been to massage the values in the current context before the query is mechanically derived, rather than to alter the derivation process. For example if the temperature is rising and its current value is 10, we may massage the current context by increasing the temperature slightly to 11 (or by turning it into the range 10..12) thus hopefully anticipating the user's future needs.
We assume there is a document collection that has been created by an author in order to provide proactive information for tourists. Each document has a Body field describing an attraction, and a set of four fields that is designed to match the user's current context and trigger if there is a good match. These four fields are Location, Orientation, Time and Temperature. Not all the documents have all these fields. For instance the only documents with a Temperature field are those that relate to outside attractions, and the only ones with an Orientation field are those that relate to views. By default all of these four fields might be specified as active; for an overall document to be active it must contain at least one of these fields; it might even contain all four. An implication of the retrieval process is that documents having an Orientation field (e.g. a document concerned with views) would only be triggered if this matched the user's orientation; if the user had no orientation sensor, and thus no Orientation field in their current context, then, depending on whether the Matcher had a default of "presence-compulsory", these documents may or may not be triggered. An application might choose -- perhaps as a result of a user's request -- to change the defaults on active fields. For example Location might be made the only active field. The result would be that more documents would be triggered, some of them perhaps irrelevant (e.g. an nearby open air restaurant on a cold day), but at least the user would have the information for future reference.
A completely different use of the document collection would be to use it as a target for interactive retrieval, perhaps as part of a pipeline also involving proactive retrieval. In this case the current context may contain a field, set by the user, of:
<Body> Drake
and Body might be the only active field. This would then retrieve documents in the collection whose Body contained the word "Drake".