This is G o o g l e's cache of http://www.dcs.ex.ac.uk/~pjbrown/car_project/research_issues.html. G o o g l e's cache is
the snapshot that we took of the page as we crawled the web. The
page may have changed since that time. Click here for the current page without highlighting. To link
to or bookmark this page, use the following url:
http://www.google.com/search?q=cache:uICXAffCs04C:www.dcs.ex.ac.uk/~pjbrown/car_project/research_issues.html+Brown+Exeter+overall+research+issues&hl=en&ie=UTF-8
Google is not affiliated with the authors
of this page nor responsible for its
content. |
These
search terms have been highlighted: |
brown |
exeter |
overall |
research |
issues | | |
Research issues in context-aware
retrieval: a working paper on overall research issues
Peter Brown
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
P.J.Brown@ex.ac.uk
ABSTRACT
Context-aware retrieval throws up a host of research issues. In terms of planning
a research programme,
this is both a strength and a weakness. The strength is that it is a blossoming
area with potentially interesting research in a large number
of disparate areas. The weakness is that in a research programme you
cannot attack on every front, and you need to decide on a small subset of
problems to work on -- but still to be able to create realistic prototypes for
evaluation.
Here we try to enumerate the possible research issues that may be tackled.
The three basic areas
Basically
context-aware retrieval (CAR) research can be divided into
three areas, albeit overlapping ones:
- A.
- the retrieval engine.
- B.
- deriving a query from a context.
- C.
- usability issues
We consider each in turn.
Area A: the retrieval engine
CAR is a mixture of traditional IR (Information Retrieval) and IF
(Information Filtering), but is certainly not the same as either. Thus we have
to rethink and redesign in areas such as:
- A1.
- the nature of the engine.
- A2.
- performance, particularly in the light of the continuous and immediate
needs inherent in CAR.
- A3.
- capturing and representing documents, and more particularly the derived
internal representations (the equivalent of inverted indexes in IR) to help
performance.
- A4.
- catering for fields of various datatypes, e.g. numeric (1D, 2D, 3D, ...),
strings, dates/times, etc. (See B2 below.)
- A5.
- algorithms for giving a score to the strength of a match (e.g. if match A
is twice as close as match B, should this be scored four times as
highly?), and a relative weighting to different fields of a match (e.g.
distance, time, match of textual content with current interests, etc.).
- A6.
- investigating how to index a document.
- A7.
- providing protection from abuse, in the same way as Altavista uses
sophisticated methods to prevent authors artificially promoting their pages so
that they get high matching-scores for a variety of retrieval queries.
Area B: deriving a query from a context
In CAR a query will typically be partly be derived from a context and partly
from fields set by the user as in normal IR/IF queries. The context used to set
a query is often the user's current context, and this will typically consist of
a large number of fields, most of them set by sensors. Research questions are:
- B1.
- If the user is carrying a large number of sensors, there will be a
correspondingly large number of contextual fields. At any one time some of
these will be important and some will be ignorable (or given a field-weight of
zero in a derived query). For example contextual fields could correspond to
the user's location, the current air temperature, the user's heartrate, and a
set of share prices. The relative importance of these is likely to change over
time, according to the user's activity and other parameters. What is the best
way of dealing with a profusion of fields that have different and changing
field-weights? How much extra weighting should be given to field values that
have changed, e.g. a sudden drop in temperature, an increase in heartrate?
- B2.
- In general the datatype of any contextual field field may be a structured
type rather than a scalar type. For example there might be a datatype called
`event' that consisted of a time, a location and a textual description. As a
further example a datatype corresponding to a vehicle fleet (or, for that
matter, a production line) might contain arrays of objects being tracked.
Within structured types there may be a need to match both at the level of the
type (does one event match another?) or at the level of subfields of the type
(do the locations match?).
- B3.
- In general sensors give low-level readings, whereas the user -- and hence
a derived query that needs to reflect the user's needs -- works at a higher
level. How can the higher levels be synthesised from the lower levels? In
general, several levels of abstraction might be relevant to the user. For
example for location, we could have any of the following:
- at a certain x,y,z co-ordinate.
- in Room 206.
- in the Simmonds Building.
- in the Runcorn site of ICI.
- the place where my sister works (i.e. a name relevant to an individual,
rather than a universally-used name).
Any of these, except perhaps
the first, might be relevant to the user.
- B4.
- Finding the best match may involve some arithmetic combination of fields,
e.g. a field observer may record the width and length of an animal footprint,
but the key quantity in terms of trying to retrieve similar footprint
sightings may be the ratio of the length to the width, rather than the
absolute readings (which may be affected by the nature of the soil, the age of
the animal, etc.). Do we need to have a lot of ad hoc matching and
weighting algorithms to deal with these cases, or can they be systematised?
Even if ad hoc, can we represent the types of data as objects, each
of which has its own matching function? As an example of the latter a possible
approach is to make `rhino footprint' a context field; this will be a
composite field, made up of sub-fields, which are themselves derived from
sensor values. This composite field will have its own matching algorithm.
- B5.
- Fields derived from sensors are inaccurate and sometimes unreliable (e.g.
bogus readings, missed readings). How can we best deal with this (e.g.
extrapolation, relating sensor reading to others that cover the same/similar
information -- such as current apparent location and current apparent phone
cell)? Is there scope for `fuzzy matching' or is this a normal consequence of
the retrieval algorithm anyway? One approach, based on the idea of a field
being a composite set of values, is to have a probability (or even a
probability distribution) as one attribute of a field; this attribute could be
used by the matching algorithm. (Probabilities also apply to synthesised
fields such as `user is busy'; perhaps the synthesising algorithm might say
there is a 70% probability that the user is busy.) A refinement of this
approach is to have a synthesised field called, say, `accurate location',
which fused other location fields (and perhaps other fields too) to derive a
more accurate location; then we have two sorts of synthesis, one for
abstraction (see B3) and one for fusion.
- B6.
- If we assume that a query derived from the current context refers to
location X, then the corresponding documents to be matched may have:
- an explicit location, e.g. a location field in some metadata
associated with an HTML document or a CC/PP XML document.
- an implicit location: perhaps X is some x,y co-ordinates that
relate to a location in Canterbury, and therefore it would be valuable to
search for documents that contain the word "Canterbury".
Clearly
implicit matching is a significant area for research.
- B7.
- In graphics one has the concepts of a world co-ordinate system (the
co-ordinate system used by the application as a whole) and a co-ordinate
system corresponding to the user's current view of the world; there are rules
for converting between the two systems. A similar situation can occur in CAR.
For example the user's view might be based on a map displayed on their PDA
screen, and the co-ordinate system could be the x,y co-ordinates of the window
in which the map was displayed. There is a need for rules, to be applied in
both directions, to convert the user's view to the world view used by the
application. The sequence of use of rules may be as follows: (1) user points
at a location on the map; (2) the rule converts the x,y co-ordinates on the
map to world co-ordinates; (3) the application retrieves documents that match
these co-ordinates (and perhaps other contextual fields too) -- these
documents might relate to tourist sights near the user; (4) the location
associated with each retrieved document is converted back to the user's
co-ordinates; (5) the application represents each document as a hot-spot on
the map corresponding to its location. Interestingly, there are parallels
between the matching of rules and the matching of queries.
- B8.
- Real applications will not generally be able to deal with a single context
service covering the whole world; instead context services will be
distributed, and might be shared on a dynamic basis. Thus when a user boards a
bus, their context server might join that of the bus, with their location
inherited from the bus's location. As a second example, when a person is close
to a printer, they might incorporate the printer's context as part of their
own (the information being transmitted by Bluetooth, or whatever).
Area 3: usability issues
For most CAR applications the user is mobile. Thus many research areas for mobility
carry over into CAR. The various elements of a CAR application (documents,
fields, sensors, processing engines) may be distributed in numerous ways: e.g.
consider as fields a weather forecast from the web, a share price indicator from
a dedicated feed, a location derived from a GPS sensor attached to the user's
PDA. Following on from this we have several research areas derived from
mobility, and from the general usability of a CAR application:
- C1.
- General mobility mechanisms, such as caching and minimising information
flow across expensive lines, will need different mechanisms tied to CAR. In
particular CAR caching can be particularly effective when the user's future
contexts can be anticipated. Clearly there are problems, as with any sort of
caching, if the cached information can change rapidly, e.g. cached traffic
information, cached information on the state of a coffee machine or cached
information on the whereabouts of a colleague.
- C2.
- Typically CAR will be a background activity for the mobile user, and
output from the CAR application will be interrupting her normal activity. Thus
retrieving information of marginal relevance is unwelcome. Precision may be
more important than relevance: a issue that has relation to the areas in
Section A above.
- C3.
- The user interface.
- C4.
- Relevance feedback from the user.
- C5.
- Implementing all of the above within the constraint that the client
software must be of minimal size, and, more crucially, that the display is
small. This constraint is most severe in applications designed for use while
actively moving, e.g. walking; here the display must be really small -- or
perhaps it might be replaced by an audio interface. On the other hand most
mobile applications are designed for people who are generically mobile but
temporarily static, e.g. sitting down (though perhaps on a moving vehicle);
here constraints are less severe. There is perhaps hope that these constraints
will be less in the future: for example, a Delphi group has forecast that
rollable displays will be a mass-market item by 2008. At the moment, however,
the output format of traditional IR software such as retrieval engines is
almost useless on small displays -- though here audio output is potentially a
good answer.
Some relevant papers
- 1.
- Newman, W.M., Eldridge, M.A. and Lamming, M.G., `Pepys: Generating
Autobiographies by Automatic Tracking', Proc. ECSCW `91, Amsterdam,
September 1991.
- 2.
- Cooperstock, J., Fels, S., Buxton, W. and Smith, K.C. `Reactive
environments: Throwing away your keyboard and mouse', Comm. ACM,
40(9), pp. 65-73, 1997.
- 3.
- Rhodes, B.J. `The Wearable Remembrance Agent: A system for augmented
memory', Personal Technologies, 1 pp. 218-224, 1997.
- 4.
- Oard, D.W. and Marchionini, G. A conceptual framework for text
filtering, Report EE-TR-96-25, Univ. of Maryland, 1996.
- 5.
- Brown, P.J. and
Jones, G.J.F. `Context-aware retrieval: exploring a new environment for
information retrieval and information filtering', to be submitted for
publication, 2000.
- 6.
- Das, R.E. and Sen, S.K. `Adaptive location prediction based on a
hierarchical network model in a cellular mobile environment', Computer
Journal, 42, 6, pp. 474-486, 1999.
- 7.
- Abowd, G.D. and Dey, A.K. `Towards a better understanding of context and
context-awareness', panel statement in Gellerson, H.-W. (Ed.) Handheld and
Ubiquitous Computing, Springer, pp. 304-5, 1999.