This is G o o g l e's cache of http://www.dcs.ex.ac.uk/~pjbrown/car_project/research_issues.html.
G o o g l e's cache is the snapshot that we took of the page as we crawled the web.
The page may have changed since that time. Click here for the current page without highlighting.
To link to or bookmark this page, use the following url:

http://www.google.com/search?q=cache:uICXAffCs04C:www.dcs.ex.ac.uk/~pjbrown/car_project/research_issues.html+Brown+Exeter+overall+research+issues&hl=en&ie=UTF-8

Google is not affiliated with the authors of this page nor responsible for its content.

These search terms have been highlighted:

brown

exeter

overall

research

issues

Research issues in context-aware retrieval: a working paper on overall research issues

Peter Brown

Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
P.J.Brown@ex.ac.uk

ABSTRACT

Context-aware retrieval throws up a host of research issues. In terms of planning a research programme, this is both a strength and a weakness. The strength is that it is a blossoming area with potentially interesting research in a large number of disparate areas. The weakness is that in a research programme you cannot attack on every front, and you need to decide on a small subset of problems to work on -- but still to be able to create realistic prototypes for evaluation.

Here we try to enumerate the possible research issues that may be tackled.

The three basic areas

Basically context-aware retrieval (CAR) research can be divided into three areas, albeit overlapping ones:

A.: the retrieval engine.
B.: deriving a query from a context.
C.: usability issues

We consider each in turn.

Area A: the retrieval engine

CAR is a mixture of traditional IR (Information Retrieval) and IF (Information Filtering), but is certainly not the same as either. Thus we have to rethink and redesign in areas such as:

A1.: the nature of the engine.
A2.: performance, particularly in the light of the continuous and immediate needs inherent in CAR.
A3.: capturing and representing documents, and more particularly the derived internal representations (the equivalent of inverted indexes in IR) to help performance.
A4.: catering for fields of various datatypes, e.g. numeric (1D, 2D, 3D, ...), strings, dates/times, etc. (See B2 below.)
A5.: algorithms for giving a score to the strength of a match (e.g. if match A is twice as close as match B, should this be scored four times as highly?), and a relative weighting to different fields of a match (e.g. distance, time, match of textual content with current interests, etc.).
A6.: investigating how to index a document.
A7.: providing protection from abuse, in the same way as Altavista uses sophisticated methods to prevent authors artificially promoting their pages so that they get high matching-scores for a variety of retrieval queries.

Area B: deriving a query from a context

In CAR a query will typically be partly be derived from a context and partly from fields set by the user as in normal IR/IF queries. The context used to set a query is often the user's current context, and this will typically consist of a large number of fields, most of them set by sensors. Research questions are:

B1.

If the user is carrying a large number of sensors, there will be a correspondingly large number of contextual fields. At any one time some of these will be important and some will be ignorable (or given a field-weight of zero in a derived query). For example contextual fields could correspond to the user's location, the current air temperature, the user's heartrate, and a set of share prices. The relative importance of these is likely to change over time, according to the user's activity and other parameters. What is the best way of dealing with a profusion of fields that have different and changing field-weights? How much extra weighting should be given to field values that have changed, e.g. a sudden drop in temperature, an increase in heartrate?

B2.

In general the datatype of any contextual field field may be a structured type rather than a scalar type. For example there might be a datatype called `event' that consisted of a time, a location and a textual description. As a further example a datatype corresponding to a vehicle fleet (or, for that matter, a production line) might contain arrays of objects being tracked. Within structured types there may be a need to match both at the level of the type (does one event match another?) or at the level of subfields of the type (do the locations match?).

B3.

In general sensors give low-level readings, whereas the user -- and hence a derived query that needs to reflect the user's needs -- works at a higher level. How can the higher levels be synthesised from the lower levels? In general, several levels of abstraction might be relevant to the user. For example for location, we could have any of the following:

at a certain x,y,z co-ordinate.
in Room 206.
in the Simmonds Building.
in the Runcorn site of ICI.
the place where my sister works (i.e. a name relevant to an individual, rather than a universally-used name).

Any of these, except perhaps the first, might be relevant to the user.

B4.

Finding the best match may involve some arithmetic combination of fields, e.g. a field observer may record the width and length of an animal footprint, but the key quantity in terms of trying to retrieve similar footprint sightings may be the ratio of the length to the width, rather than the absolute readings (which may be affected by the nature of the soil, the age of the animal, etc.). Do we need to have a lot of ad hoc matching and weighting algorithms to deal with these cases, or can they be systematised? Even if ad hoc, can we represent the types of data as objects, each of which has its own matching function? As an example of the latter a possible approach is to make `rhino footprint' a context field; this will be a composite field, made up of sub-fields, which are themselves derived from sensor values. This composite field will have its own matching algorithm.

B5.

Fields derived from sensors are inaccurate and sometimes unreliable (e.g. bogus readings, missed readings). How can we best deal with this (e.g. extrapolation, relating sensor reading to others that cover the same/similar information -- such as current apparent location and current apparent phone cell)? Is there scope for `fuzzy matching' or is this a normal consequence of the retrieval algorithm anyway? One approach, based on the idea of a field being a composite set of values, is to have a probability (or even a probability distribution) as one attribute of a field; this attribute could be used by the matching algorithm. (Probabilities also apply to synthesised fields such as `user is busy'; perhaps the synthesising algorithm might say there is a 70% probability that the user is busy.) A refinement of this approach is to have a synthesised field called, say, `accurate location', which fused other location fields (and perhaps other fields too) to derive a more accurate location; then we have two sorts of synthesis, one for abstraction (see B3) and one for fusion.

B6.

If we assume that a query derived from the current context refers to location X, then the corresponding documents to be matched may have:

an explicit location, e.g. a location field in some metadata associated with an HTML document or a CC/PP XML document.
an implicit location: perhaps X is some x,y co-ordinates that relate to a location in Canterbury, and therefore it would be valuable to search for documents that contain the word "Canterbury".

Clearly implicit matching is a significant area for research.

B7.

In graphics one has the concepts of a world co-ordinate system (the co-ordinate system used by the application as a whole) and a co-ordinate system corresponding to the user's current view of the world; there are rules for converting between the two systems. A similar situation can occur in CAR. For example the user's view might be based on a map displayed on their PDA screen, and the co-ordinate system could be the x,y co-ordinates of the window in which the map was displayed. There is a need for rules, to be applied in both directions, to convert the user's view to the world view used by the application. The sequence of use of rules may be as follows: (1) user points at a location on the map; (2) the rule converts the x,y co-ordinates on the map to world co-ordinates; (3) the application retrieves documents that match these co-ordinates (and perhaps other contextual fields too) -- these documents might relate to tourist sights near the user; (4) the location associated with each retrieved document is converted back to the user's co-ordinates; (5) the application represents each document as a hot-spot on the map corresponding to its location. Interestingly, there are parallels between the matching of rules and the matching of queries.

B8.

Real applications will not generally be able to deal with a single context service covering the whole world; instead context services will be distributed, and might be shared on a dynamic basis. Thus when a user boards a bus, their context server might join that of the bus, with their location inherited from the bus's location. As a second example, when a person is close to a printer, they might incorporate the printer's context as part of their own (the information being transmitted by Bluetooth, or whatever).

Area 3: usability issues

For most CAR applications the user is mobile. Thus many research areas for mobility carry over into CAR. The various elements of a CAR application (documents, fields, sensors, processing engines) may be distributed in numerous ways: e.g. consider as fields a weather forecast from the web, a share price indicator from a dedicated feed, a location derived from a GPS sensor attached to the user's PDA. Following on from this we have several research areas derived from mobility, and from the general usability of a CAR application:

C1.: General mobility mechanisms, such as caching and minimising information flow across expensive lines, will need different mechanisms tied to CAR. In particular CAR caching can be particularly effective when the user's future contexts can be anticipated. Clearly there are problems, as with any sort of caching, if the cached information can change rapidly, e.g. cached traffic information, cached information on the state of a coffee machine or cached information on the whereabouts of a colleague.
C2.: Typically CAR will be a background activity for the mobile user, and output from the CAR application will be interrupting her normal activity. Thus retrieving information of marginal relevance is unwelcome. Precision may be more important than relevance: a issue that has relation to the areas in Section A above.
C3.: The user interface.
C4.: Relevance feedback from the user.
C5.: Implementing all of the above within the constraint that the client software must be of minimal size, and, more crucially, that the display is small. This constraint is most severe in applications designed for use while actively moving, e.g. walking; here the display must be really small -- or perhaps it might be replaced by an audio interface. On the other hand most mobile applications are designed for people who are generically mobile but temporarily static, e.g. sitting down (though perhaps on a moving vehicle); here constraints are less severe. There is perhaps hope that these constraints will be less in the future: for example, a Delphi group has forecast that rollable displays will be a mass-market item by 2008. At the moment, however, the output format of traditional IR software such as retrieval engines is almost useless on small displays -- though here audio output is potentially a good answer.

Some relevant papers

1.: Newman, W.M., Eldridge, M.A. and Lamming, M.G., `Pepys: Generating Autobiographies by Automatic Tracking', Proc. ECSCW `91, Amsterdam, September 1991.
2.: Cooperstock, J., Fels, S., Buxton, W. and Smith, K.C. `Reactive environments: Throwing away your keyboard and mouse', Comm. ACM, 40(9), pp. 65-73, 1997.
3.: Rhodes, B.J. `The Wearable Remembrance Agent: A system for augmented memory', Personal Technologies, 1 pp. 218-224, 1997.
4.: Oard, D.W. and Marchionini, G. A conceptual framework for text filtering, Report EE-TR-96-25, Univ. of Maryland, 1996.
5.: Brown, P.J. and Jones, G.J.F. `Context-aware retrieval: exploring a new environment for information retrieval and information filtering', to be submitted for publication, 2000.
6.: Das, R.E. and Sen, S.K. `Adaptive location prediction based on a hierarchical network model in a cellular mobile environment', Computer Journal, 42, 6, pp. 474-486, 1999.
7.: Abowd, G.D. and Dey, A.K. `Towards a better understanding of context and context-awareness', panel statement in Gellerson, H.-W. (Ed.) Handheld and Ubiquitous Computing, Springer, pp. 304-5, 1999.