Peter Brown
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
P.J.Brown@ex.ac.uk
ABSTRACT
Todo
We believe that recording and using history has great potential for improving the performance of CAR systems. In particular it relates to the key issue of improving the relevance of the documents delivered to the user. Three possible types of history of interest are:
We discuss each of this in turn. The first one is our primary interest since it is special to CAR.
Some authors have said that what is of interest is not the current context itself, but change in the current context. Most of us agree that there is a lot of truth in this. Obviously keeping a history is a pre-requisite for detecting change. In general the current context will consist of several fields, and some may be changing fast and some may be static. (One field, time, is perhaps unique in that it changes continually in a predictable way -- assuming the user does not wish to change the time by pretending that they are at some time in the past or future.) We would like to investigate ideas about weighting fields that are changing, perhaps giving special priority to fields that have suddenly changed after a long static period, or, more generally, fields that suddenly have an increased rate of change.
Such uses of change mean that the application should keep a complete history of change, at least for the current session. (We could even keep histories of past sessions too -- perhaps even relating to different users -- in order to try to detect patterns, but at present we are not interested in such deep and complex analysis.)
History is most obviously useful for numerical fields, but may also be useful for fields of other data types, in particular textual fields. Change and prediction of separate fields may be interdependent: if the air pressure is dropping, the likelyhood of rain is increasing, i.e. two fields Pressure and Rain-liklyhood may interdepend; if the time is approaching 1 p.m., the user preferences field may be likely to relate to eating, i.e. Time and Preferences fields may interdepend.
Interestingly, history can sometimes be generalised to include the future as well as the past. For example if the user's diary says he is planning to be at a meeting at a certain location in two hour's time, then this can be recorded as a `future' item in the history of their location field. Indeed some fields may naturally relate to the future rather than the past, e.g. a temperature field derived from a weather forecast; the "historical" information for such fields may be largely or even totally related to the future. Hence when we use the word "history" in this paper, we include the future too: in other words our history is looking back from a viewpoint long in the future. (An alternative term would be "time record".) Of course as time goes by, future events can become past events -- but perhaps only if they are detected as really happening, e.g. that sensor values indicated that a user really did attend a scheduled meeting.
In addition history can be used for prediction: predicting the values of field values in the future. The rationale for this is that users are likely to be more interested in information relating to current contexts that are ahead of them than behind them. For example a location field for a location ahead of the user might be more interesting than a location behind. Of course predictions can be made invalid by a sudden change (the temperature suddenly drops after a period of rise or the user veers from their previous path): in such cases the application may wish to take fast remedial action by cancelling the previous retrieval operation if it is still running, and initiating a new one.
Prediction is also useful in other circumstances, though these are not of priority to us at present:
We will use the collective term history exploitation strategies to describe the kinds of strategy we have indicated above.
Implementing the history of the current context is not a major task. It is just necessary to remember each current context at one of the following occasions:
We call the set of remembered current contexts the history archive. [[NB: Name now changed to CONTEXT DIARY, since it covers the future as we as history.]] Each remembered context needs to have a time attached to it. (Alternatively we could abandon the traditional concept of a history based on time, but instead organize history round some other field, such as location; as an added refinement the concept of remembering user trails is a history based on two fields: time and location.) A possible complication is that there might be several histories: e.g. the history of the real current context, as detected by sensors, etc., and a history of pretended fields created by the user. The latter might be completely different from the former, so it is useful to keep them separate. Alternatively the system might only create a history archive for real values detected by sensors, and might only record history of a field when that field was known to have a real value. (We assume that the pretended worlds are much less predictable and less continuous than the real world, and thus the value of their history is much less -- or even worthless.)
A much bigger task is how to incorporate into the retrieval engine some hooks for history exploitation strategies. This is in addition to the need for general hooks that are not connected with history, e.g. a hook to insert an algorithm to match two temperatures and return a score that records how well the two matched. We assume we have the following architecture:
We assume that the history archive is available to each of the above, if required.
One approach is to use pre-processing. A pre-processor is a natural place to incorporate field-weighting strategies, but it can also be used for wider strategies such as prediction. In the latter case a possible principle is to assume that the user does not want information relating to their current context as such, but instead that they want information relating to their context-of-interest, which is a context slightly ahead of their current context. With this principle the task of the pre-processor is to take a current context and a history archive and to derive a context-of-interest on the basis of prediction. The context-of-interest is then passed to the retrieval engine in place of the current context. If a pre-processor is almost totally committed to prediction, it is convenient to call it a prediction engine, so that its purpose is clear.
An alternative is to incorporate history exploitation strategies into the retrieval engine itself, or more specifically into the matching algorithms that are plugged into it. For example the algorithm for matching two field values would look at the history of the field that represented the present context, and would use this as a factor in calculating the score. I have a certain reluctance to use this approach, since it has a flavour of being monolithic rather than modular, but perhaps we should allow it. The only cost appears to be making the history archive available to algorithms plugged in by the user. If the retrieval engine is a separate server, an alternative approach to our history archive may be attractive: each field of the current context supplied by the client consists of the current value together with a history of past values. However we will assume the history archive approach since it appears to be more widely useful.
The third alternative is to incorporate history exploitation strategies into a post-processor. We assume the retrieval engine returns a set of matched documents with an overall matching-score on each document, and an individual matching-score on each field that was matched within the document. The uses of history in a post-processor mirror the uses of a post-processor in general:
In some applications, the document collection may be dynamic, with documents continually being added, deleted or altered. An example would be a collection relating to traffic information. In such cases the history of change may be useful to the retrieval process: for example recent or frequently changing documents may be given a higher score.
Even with static collections, another piece of history may be important: the history of which documents have been already passed to the application. A document never before retrieved may be given a higher score than one that was retrieved on the last retrieval request. These pieces of history may also be important if a document is deleted: for example if a document relating to a traffic problem has now been deleted, and if this document has recently been passed to the application, then the application might like to be told of the deletion. Again, this kind of history may incorporate the future. It may happen, for example, that a document's content is updated regularly every hour, and this knowledge may be used to refine scores (e.g. if the content was updated 59 minutes previously, its score may be low).
All these uses of the history of the document collection are not in principle tied to context-aware retrieval, but could apply in any situation where the document collection is dynamic. However the strategies used may be specific to context-aware retrieval, mainly because in CAR the user is generally issuing a continuous stream of slowly-changing queries (thus, for example, a document that is to be updated in one minute's time may be given a low score because it would be better to deliver it at the next request).
Todo
Part B and Part C above have many parallels with traditional IR/IF, and it will probably be possible to use similar approaches. Part A, however, is almost entirely specific to CAR, and involves many new research issues. There is, however, one minor parallel: in IF, the query, i.e. the profile, usually changes very slowly (in contrast to the CAR current context, change may be over months rather than over seconds). In principle, however, it may be treated in a similar manner to change in CAR. For example one could conjecture that if the user has made a small change to their profile, then the changed part of the profile should have a higher weighting than the rest.
Todo: analysis of other similarities, if any, between the two sorts of history.
todo: more citations