Peter Brown
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
P.J.Brown@ex.ac.uk
ABSTRACT
Todo
A central task in context-aware retrieval is to match the user's present context against the contexts associated with each document in the document collection, and to retrieve those documents that (best) match. Pilot applications have often used Boolean matching of fields, because this is simple. It is, however, likely that a best-match strategy will generally yield better results in terms of relevance and precision. We say "generally" because there are some cases -- essentially those cases where the application deals with absolutes -- where Boolean matching still has a place: e.g. to retrieve information that is only relevant to a particular room, to a particular temperature range (below freezing-point, say) or to a particular range of opening hours. In such absolute cases, one can argue that if the present context does not match the context associated with a document, then the document should not be retrieved: being close is not good enough. (Actually the example of opening hours is asymmetric in this respect: just after is a killer, but just before is no great problem.) Overall, however, we stick by our claim that best-match retrieval is generally the preferred approach, and this paper explores issues surrounding this.
We will make a number of assumptions for the sake of simplicity:
An important characteristic of CAR is that history is an important aid to improving performance. This is for two reasons:
History is probably most important in the user's current context, but it can also be important in the content of the document collection, in cases where this is not static. The main uses of history are (a) for recording change, and (b) for prediction of future field values. The way a CAR application uses history is likely to be very different to traditional IR and IF applications. We discuss history, and the way it might be exploited in pre- and post-processors to the retrieval engine in [7].
Traditionally the disciplines of databases and information retrieval have been entirely separate, though some technologies, such as fuzzy databases, begin to bridge the gap. CAR applications will sometimes lie uneasily between databases and information retrieval. Numeric fields or multiple-choice fields are often best treated with a database approach, whereas textual fields are likely to need a retrieval engine (for example, a user-preferences field for hotels may contain strings such as "fishing", "garden", "countryside" and "old manor house", and the best way of matching these is likely to involve using a textual description of each hotel, rather than expecting a database to relate in each of the user's preferences).
Often it will be necessary to exploit an existing body of information, such as existing web pages or existing database content. Obviously this will affect the decision on what retrieval technology to use. In contrast, there are some applications where the document collection is created with CAR in mind. This applies to "memory aid" systems that captures events in the user's life with a view to later CAR, and to CAR systems for conference attendees [4]. In the latter case, all the conference information may be prepared with mark-up designed for CAR. For example the conference streams may be marked with their rooms, times and perhaps the audience they are designed for. Indeed it may be that CAR is the only way to retrieve the conference information: someone away from the conference site may need to pretend, using a suitable interface, that they are at the conference time and place in order to retrieve information. Arguably this is a good and natural metaphor anyway. Todo: relate to Rooms interface from Xerox PARC.
Obviously applications that can decide the form of the document collection have a potential advantage for achieving good retrieval performance.
Overall we have a preference for an information retrieval approach, because it is likely to be more flexible, provided that some of the associated research issues can be solved. We discuss this issue further in [8].
Todo: discuss existing retrieval applications where scores are combined. We assume that the scoring mechanism is such that an individual score is calculated for each field, and then these field scores are combined in some way to yield an overall score for a document match. A previous paper [2], which explores an example where there is just one contextual field (location), suggests that scores be multiplied together, and we will follow that approach here. For more details of scoring see [9].
It is worth saying a little more about non-matches. There are three possible cases, as illustrated by the following examples:
Case 2 above is less serious than case 1, and perhaps should be given a low score but not a zero score.
The scoring of matches of individual fields will depend on the nature of each field. For textual fields, information retrieval has produced a large body of research and effective algorithms, and this can be exploited. There is much more need for research into scoring of numeric matches, since such fields are likely to form the bulk of contextual fields. Numeric values can be one-dimensional, two-dimensional (as used for many location fields), three-dimensional (as used for locations that include height), or more. Reference [2] has suggested some approaches to scoring the matching of two two-dimensional locations. We suggest that there be generic algorithms that can be used for matching any numeric field of a certain dimension, but recognise that there be special cases where the generic algorithms need to be overridden. An example of a special case is a time field representing the opening hours of a tourist attraction: assume this has a value of 14.00 to 17.00. If the current time is 14.05, or even 13.55, then this a good match since the attraction is just opening, whereas if the current time is 16.55 it is a bad match; thus there is an asymmetry about matching, with a bias towards the start of a range, and a special algorithm will need to be used to reflect this.
As indicated already, the values of numeric fields are often likely to be ranges rather than single values. For example a document may relate to a certain area, which is represented by a circle, rectangle or polygon; opening hours will typically be a range or times, or even several disjoint ranges. Fields of the user's current context may also be represented as ranges, especially when the recording method is inaccurate or uncertain. For example a user whose location is set via a GPS sensor may have the location represented as a circle, with the current GPS reading as centre, whereas another user whose location is recorded more precisely using a short-range beacon may have their location recorded as a point.
In this Section we discuss some details of the generic algorithms that can be used for matching, and we will start with some general considerations.
Firstly, matching will involve two values, V1 and V2, with V1 derived from the query and V2 derived from the document being matched; we suggest that the generic algorithms be commutative, i.e. matching V1 with V2 yields the same score as matching V2 with V1. (However some possible scoring systems go against this: if V1 and V2 are ranges, one possible scoring system is to give extra credit if V2 completely includes V1.)
Secondly, we must remember that not all the matches will involve the same fields, e.g. some tourist attractions may have an associated Temperature field that needs to be matched, whereas others might have a Time-of-day field. As a result there is a need for fairness between the scores on different fields; for example if Temperature matching tended always to get a low score and Time-of-day a high score, this would distort what was delivered to the user.
Thirdly consider the two cases:
Fourthly we assume each match is independent, both of the matching of other fields, and matching of the same field but with a different query/document. For example in a tourist application if an attraction is one of 20 that are within a mile of the user, then the score would be just the same as if it had been the only one. In the former case we assume the thresholding algorithm (e.g. only deliver the 5 best matches to the user) will act as a discriminator.
Fifthly we assume (at least until experience dictates otherwise) that scores are the same irrespective of whether matching is proactive or interactive.
Sixthly, we just discuss basic matching scores here: these may be changed by looking at wider considerations, such as history of change and prediction. Such changes are achieved by separate pre- and post- processors.
In order to illustrate the properties of an ideal generic scoring system, we will give some examples of suggested scores. The first set of examples applies to two-dimensional locations, and assumes a interactive query from a user who has given as their location an area (i.e. a range of locations) covering Devon. The list below gives suggested scores for documents with the given associated locations:
Our next list assumes proactive matching (though we have provisionally assumed this gives the same as interactive matching) to a user's location that is a point in the small village of Lustleigh.
Our next list assumes matching a temperature of 10 -- a one-diminensional case:
We suggest the default algorithms score a match in the centre of a range more highly than a match at the edge (or in matching two ranges, they score higher if their centres are close). There is a possible counter-example to this: a person located at Dover (on the edge of England) is more likely to be interested in a page about England than a user a Birmingham (near the centre of England); this is because a person who has just entered an area is likely to be most interested in it. We are inclined to reject this counter-example: we believe that it lies in the realm of pre-processing and post-processing algorithms that may take account of history and of newly-triggered pages that have not been previously triggered. This is a separate (though important) concern, and should be kept apart from the basic scoring algorithms: it would be wrong to have two sets of algorithms each trying to do the same thing.
It is convenient to allow infinite ranges. There are three possible cases:
Two desirable properties of our algorithms for scoring a match of V1 and V2 are:
We suggest as a prototype the following scoring algorithm. It applies to numeric values of any dimensionality. We assume a point is treated as a range where there low and high values are the same. Thus the algorithm always matches two ranges. It uses the concept of spread introduced earlier. The spread is calculated on a pre-pass at the start of a session (or alternatively could be declared by the author, e.g. in a special document, labelled `spread', within the document collection), but it needs to be updated if any dynamic value, e.g. a value from a sensor, extends the spread. There is a spread for each numerical field; if the value is multi-dimensional then there is a spread for each dimension. The spread-size is the size of the spread: thus if the spread is -20..60. the spead-size is 60. The maximum-distance for a value is the square root of the sum of the squares of the spread-sizes for each dimension. For one-dimensional value the maximum-distance will be the spread-size. The proportional-overlap of two ranges is the proportion of the smaller range that overlaps the larger (?plus halo). If the smaller range is infinite.. todo.
Assumptions: notation: range of x to y is written x..y; score is in range 0..2, with 2 a perfect score; if a value-tuple is (e.g.) a pair of values, then this represents a geometric point, not two independent values -- therefore we need to treat values as a whole, rather than say as a sequence of 1D values; ranges are rectangular.
Possible algorithm for cases where at least one is a point (MD=maximum-distance):
score = sqrt(((MD - distance-between-centres) / MD) ^ 2 + sqrt((MD - distance-outside-edge) / MD) ^ 2))
If a point is within a range, its distance-outside-edge is 0; hence if a point is within a range it score is at least 1. (If, bizarely, either of the distances in the formula is greater than MD, then that distance is set to MD.)
If comparing two ranges, we take two opposite corners of each range, e.g. the NE corner and the SW corner if in 2D. The distance-outside-edge is then the average of the distance between the two NE corners and the distance between the two SW corners. Another algorithm for ranges:
score = (2 - (distance-between-range-centres)/maximum-distance - (size-difference-of-ranges)/(greater-size)) * proportional-overlap
Some possible heuristics for matching ranges of values are:
As our experience with a field trail of location-based stick-e notes showed [6], absolute distance is an unsatisfactory measure of relevance. This is because the terrain and its access is important, e.g. roads, private areas, cliffs, rivers, walls. Similar issues doubtless apply to other types of field.
There is interesting research top be done in this area, but, given all the other issues, we do not plan to do it. Instead we will ignore discontinuity problems.
Interestingly, the application widely held to be the first context-aware system, the Olivetti active badge system [5], presented contexts in terms of probabilities. For example a display shows the people in a building, each with a room and a probability that they are really there (which is based on the nature and time of the sighting). Later CAR systems have often failed to follow this good example.
When a CAR application supplies a field value, it might also usefully supply a probability that the value is correct. The information is particularly useful to a retrieval engine that naturally works in a probablistic way. (For numeric fields, an alternative to probabilities is to use a range of possible values.)
In spite of its possible attractions, we do not plan to follow a probablistic approach in our implementation: we are already experimenting in many areas and we do not want to compound this with an experimental probablistic retrieval engine.
Wrong assumption: an author might assume that a range of, e.g., 10..20, is a Boolean concept, and anything outside gets a score of 0!
Do we need to have a symmetrical algorithm?