document converted to HTML form

[[page created automatically from word-processed document; ]]

CONTEXT-AWARE

INFORMATION RETRIEVAL

Peter Brown and Gareth Jones

Dept of Computer Science

Univ. of Exeter, UK

Information retrieval and filtering

Best match technology has replaced Boolean.
Can be applied to information that is partially structured into fields, e.g. Author, Title, Abstract.
Information retrieval (IR):
- Huge progress over last 30 years, e.g. current web search engines.
- Typically is optimised by building surrogates that assume a (fairly) static document collection; queries are dynamic, one-off.

Information filtering (IF):
- Sample scenario: alerting researchers of new research papers; each researcher has a profile representing their interests -- essentially this is a query that is applied to each new document; input is a stream of documents (the new research papers).
- Each input document may need to be matched against thousands of profiles.
- Typically is optimised on the assumption that queries are (fairly) static but documents are dynamic, one-off.

(Here we are interested in the user's context, not the context of the documents being retrieved.)
Reuters: `retrieval by context is the key to taming the information explosion'.
HP annual report highlights context-aware applications as a key application of the future.
Relates to Xerox work on streamlined interfaces.
The richer the context, the more likely that applications are to be successful.

Tourism:
- proactive: a context is attached to each document describing a tourist site; when the user's context matches this, the document is retrieved.
- interactive: the user makes explicit requests to retrieve information about their present context.
Xerox memory prosthesis.
Rhodes/Maes: just-in-time information retrieval agents.
`Tell me the nearest Macdonalds' services offered by cellphone service providers.
Exhibition guides, crowd management.
Most current applications deal with limited contexts (e.g. just location) and small amounts of data.

Related both to IR and IF.
Often used on a PDA by a mobile user.
Retrieval speed is crucial.
High precision is crucial, especially in proactive applications.
Often numeric values of context fields are ranges rather than single values, e.g. an area or a temperature range.
There is a need for experimentation in how matching fields are scored, weighted and aggregated.

For CAR to achieve:
- better retrieval speed
- higher precision
over rich contexts and large document collections.
We hope to achieve this by exploiting the success of IR/IF; we will use best-match technology, i.e. every document match has a score.
We want to find solutions that can be applied to existing IR/IF engines.
We have a hard problem: (1) the data is dynamic, and (2) there is a need to give the user the illusion that retrieval is continuous as their context changes.

Because we have a hard problem, we need to identify and exploit some unique advantages that CAR has; one such advantage is that the user's context changes gradually and semi-predictably. We can use this advantage in many ways.
To preserve a record of change we build a Context Diary containing:
- history of past values of each contextual field.
- forecast future values of some fields; these can be derived from weather forecasts, corporate or personal diary entries (e.g. "At XXX at 4 p.m. to discuss YYY", "Staff meeting at 5 p.m."), etc.

Weights:
- Generally CAR involves matching several different fields, e.g. location, companions, document being read, etc. These fields can have different weights in their contribution to the overall score.
- Conjecture: changing fields are more important than static ones, especially if the change is sudden. Therefore they should be given higher weights (and weights should be changeable on-the-fly).
Extrapolation:
- user really wants to retrieve information about the context `just ahead' of their current context. Can estimate future context from the context diary and use this as the basis for retrieval.
- can extrapolate when sensors fail, e.g. GPS in a tunnel.
Problem: how do we measure whether our approaches really do deliver higher precision?

Can forecast what the user's context is likely to be in one minute's time, and do the retrieval (hopefully taking less than one minute); if forecast is correct the results are ready when the user asks for them.
Can build a context-aware cache:
- assumption: a small change in the user's context will result in a small change in the documents that are retrieved and their scores.
- cache can cover the union of all contexts the user is likely to enter in the next (say) 15 minutes (retrieving the cache is like a normal retrieval, but with a wider context). For the next 15 minutes retrieve from the cache rather than the original document collection (i.e. cache is a surrogate). At end of 15 minutes, or when forecast proves wrong, update the cache.
- cache is useful in client/sever configurations if connection is periodic and/or expensive..

Some studies show that our movements are not really as regular and predictable as we remember them to be.
Caching may work for numeric fields, but it is harder to apply to texts and images (e.g. what is the union of likely future ones?).

(Great thanks to Leverhulme Foundation for helping to support this.)

Have CAR engine that we have implemented in Java; designed to be a basis for experimentation; e.g. user can embed, in their data, on-the-fly Java algorithms for matching, weighting, etc., which override the built-in ones.
Have obtained tourism database from South West Tourism, and have got some retrieval results from this.
Have two other collaborators.
We are exploiting WWW where possible to provide our infrastructure; currently using client/server model.

Retrieval is only part of any context-aware application; however it is a crucial part; others crucial parts are context capture, HCI, distribution, feed-back, etc. -- lots of research areas.
Lack of retrieval precision and poor retrieval speed are likely to be the limitations to many real-world CAR applications; break-throughs are needed.
We believe that a key to success is combining:
- existing IR/IF
- the special characteristics of CAR