[[page created automatically from word-processed document;
]]
CONTEXT-AWARE
INFORMATION RETRIEVAL
Peter Brown and Gareth Jones
Dept of Computer Science
Univ. of Exeter, UK
Information retrieval and filtering
-
Best match technology has replaced Boolean.
-
Can be applied to information that is partially structured into fields, e.g.
Author, Title, Abstract.
-
Information retrieval (IR):
-
Huge progress over last 30 years, e.g. current web search engines.
-
Typically is optimised by building surrogates that assume a (fairly) static document collection; queries are dynamic, one-off.
Information retrieval and filtering (cont.)
-
Information filtering (IF):
-
Sample scenario: alerting researchers of new research papers;
each researcher has a profile representing their interests -- essentially this is a query that is applied to each new document; input is a stream of documents (the
new research papers).
-
Each input document may need to be matched against thousands of profiles.
-
Typically is optimised on the assumption that queries are (fairly) static but documents are dynamic, one-off.
Detecting the user's current context
-
Location: GPS, E911, bar-codes, active bat, etc.
-
Time-of-day, season, temperature
-
Images from cameras
-
Current activity (e.g. detected by Pepys)
-
Document currently being read/composed
-
Weather and weather forecasts
-
Companions
-
Share price thresholds
-
Traffic information
-
Fields set by users, e.g. current interest
-
Fields overridden by user, e.g. `pretend I am at XXX; what would be retrieved?'
Context-aware retrieval (CAR) prospects
-
(Here we are interested in the user's context, not the context of the documents being retrieved.)
-
Reuters: `retrieval by context is the key to taming the information explosion'.
-
HP annual report highlights context-aware applications as a key application of the future.
-
Relates to Xerox work on streamlined interfaces.
-
The richer the context, the more likely that applications are to be
successful.
Context-aware retrieval applications
-
Tourism:
-
proactive: a context is attached to each document describing a tourist site; when the user's context matches
this, the document is retrieved.
-
interactive: the user makes explicit requests to retrieve information about their present context.
-
Xerox memory prosthesis.
-
Rhodes/Maes: just-in-time information retrieval agents.
-
`Tell me the nearest Macdonalds' services offered by cellphone service providers.
-
Exhibition guides, crowd management.
-
Most current applications deal with limited contexts (e.g. just
location) and small amounts of data.
Attributes of context-aware retrieval
-
Related both to IR and IF.
-
Often used on a PDA by a mobile user.
-
Retrieval speed is crucial.
-
High precision is crucial, especially in proactive applications.
-
Often numeric values of context fields are ranges rather than
single values, e.g. an area or a temperature range.
-
There is a need for experimentation in how matching fields are scored, weighted and aggregated.
Our research aim
-
For CAR to achieve:
-
better retrieval speed
-
higher precision
-
over rich contexts and large document collections.
-
We hope to achieve this by exploiting the success of IR/IF; we will use best-match technology, i.e. every document match has a score.
-
We want to find solutions that can be applied to existing IR/IF engines.
-
We have a hard problem: (1) the data is dynamic, and
(2) there is a need to give the user the illusion that retrieval is continuous as their context changes.
Basic approach
-
Because we have a hard problem, we need to identify and
exploit some unique advantages that CAR has; one such advantage is
that the user's context changes gradually and semi-predictably.
We can use this advantage in many ways.
-
To preserve a record of change we build a Context Diary containing:
-
history of past values of each contextual field.
-
forecast future values of some fields; these can be derived
from weather forecasts, corporate or personal diary entries
(e.g. "At XXX at 4 p.m. to discuss YYY", "Staff meeting at 5 p.m."), etc.
Exploiting the context diary to improve precision
-
Weights:
-
Generally CAR involves matching several different fields, e.g. location, companions, document being read, etc.
These fields can have different weights in their contribution to the
overall score.
-
Conjecture: changing fields are more important than static ones, especially if the change is sudden.
Therefore they should be given higher weights (and weights should be changeable on-the-fly).
-
Extrapolation:
-
user really wants to retrieve information about the context `just ahead' of their current context.
Can estimate future context from the context diary and use this
as the basis for retrieval.
-
can extrapolate when sensors fail, e.g. GPS in a tunnel.
-
Problem:
how do we measure whether our approaches really do deliver higher
precision?
Exploiting the context diary to improve speed
-
Can forecast what the user's context is likely to be in one minute's time, and do the retrieval (hopefully taking
less than one minute); if forecast is correct the results are
ready when the user asks for them.
-
Can build a context-aware cache:
-
assumption: a small change in the user's context will result
in a small change in the documents that are retrieved and their scores.
-
cache can cover the union of all contexts the user is likely to enter in the next (say) 15 minutes (retrieving the
cache is like a normal retrieval, but with a wider context).
For the next 15 minutes retrieve from the cache rather than the
original document collection (i.e. cache is a surrogate).
At end of 15 minutes, or when forecast proves wrong, update the cache.
-
cache is useful in client/sever configurations if connection
is periodic and/or expensive..
Forecasting: caveats
-
Some studies show that our movements are not really as regular and
predictable as we remember them to be.
-
Caching may work for numeric fields, but it is harder to apply to texts
and images (e.g. what is the union of likely future ones?).
Current state
(Great thanks to Leverhulme Foundation for helping to support this.)
-
Have CAR engine that we have implemented in Java;
designed to be a basis for experimentation; e.g. user can
embed, in their data, on-the-fly Java algorithms for matching, weighting, etc., which override the built-in ones.
-
Have obtained tourism database from South West Tourism, and have got
some retrieval results from this.
-
Have two other collaborators.
-
We are exploiting WWW where possible to provide our
infrastructure; currently using client/server model.
Conclusions
-
Retrieval is only part of any context-aware application; however it is a crucial part; others crucial parts are context capture, HCI, distribution, feed-back, etc. -- lots of research areas.
-
Lack of retrieval precision and poor retrieval speed are likely to be the limitations to many real-world CAR applications; break-throughs are needed.
-
We believe that a key to success is combining:
-
existing IR/IF
-
the special characteristics of CAR