P.J. Brown
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
e-mail: P.J.Brown@exeter.ac.uk
ABSTRACT
There is a plethora of approaches to retrieval. At one extreme is a web search engine, which provides the user with complete freedom to search a collection of perhaps over a billion documents. At the opposite extreme is a web page where the author has supplied a small number of links to outside documents, chosen in advance by the author. Many practical retrieval needs lie between these two extremes. This paper aims to look at the multi-dimensional spectrum of retrieval methods; at the end, it presents some hypothetical tools, hopefully blueprints to the future, that aim to cover wider parts of the spectrum than current tools do.
Hall (1) highlights two extremes in methods of retrieving information: (a) the traditional hypertext fixed link, which leads to a single document, and (b) the web search engine. Hall, together with several other researchers, has prophesied that the future lies between these extremes. This applies both on fixed terminals with large screens and, even more, on personal devices: since these have small screens and generally sparse computational and input/output resources, it is imperative that any retrieval approach be optimised to the user's real needs. The purpose of this paper is to explore the spectrum of possibilities between the two retrieval extremes; not surprisingly, it is a multi-dimensional spectrum. In essence each dimension of the spectrum goes from complete freedom (the user is on their own) to high levels of constraint (decisions have been made on behalf of the user by a human -- such as the author of a hypertext page -- and/or a computer program).
We will start by giving some examples of applications that represent points on the spectrum, beginning with the most general forms of retrieval and moving towards more focussed retrieval.
These applications just represent points on a wide, multi-dimensional spectrum; we discuss these dimensions below.
The set of examples presented above tried to highlight the differences between applications by showing `pure' examples of each. There are, however, several examples of applications that combine the properties of two or more of the pure examples. In particular there are numerous applications that bring together browsing and searching. We will mention just two of them here in order to give a flavour: each encompasses a well-designed combination of the two facilities, rather than a simplistic cobbling together. Firstly, ScentTrails (4) is a browser where the user supplies a set of search terms indicating their current interests. The user might update these terms as the search proceeds. When the browser presents a hypertext page, it looks at the links within the page, and gives them different weightings according to their apparent relevance to the user's search terms. The more highly weighted a link the more it is highlighted, e.g. by using an increasingly large font. ScentTrails uses a sophisticated algorithm to calculate weights, taking account of the linking structure and of the occurrences of search terms within each page.
A second interesting hybrid application is that of Cuncliffe, Taylor and Tudhope (5), and is designed for museums. The application has a supporting data structure that contains measures of the semantic closeness of items. The application supports both browsing and search queries, and the semantic data structure is used to help integrate the two, and to suggest alternative approaches when the user comes to a dead end on one approach.
The discussion in this paper concentrates on existing and potential applications. A more fundamental approach is to look at human information-seeking behaviour in general. A comprehensive treatment of this can be found in the book by Marchionini (6). Marchionini's analysis covers not only computer applications but cases where the information provider is a human, e.g. asking a colleague rather than performing a search using a computer. Marchionini separates out two different strategies in information seeking: analytical strategies and browse strategies. The former, of which IR is an example, are planned, goal-driven, deterministic, formal and consisting of discrete steps; in contrast the latter tend to be opportunistic, data-driven (e.g. the data in one hypertext page provides links to the next), heuristic, informal, and consisting of a continuous sequence of steps. Obviously there is scope, in most information-seeking tasks, to use a mixture of browsing and analytical strategies. There is also scope for finding new approaches that come mid-way between analytical strategies and browse strategies.
An earlier paper, pre-dating the web, presented human information-seeking needs in a three-dimensional model (7), where the dimensions are structural responsibility, target orientation and interaction method; the paper gives eight exploration paradigms that represent the vertices of the model.
Different retrieval technologies often use different terminology. For consistency we will here choose one terminology: the terminology of Information Retrieval. Thus we have a query, which specifies the user's needs, and a document collection from which documents that match the query are retrieved. The retrieved documents are delivered to the user. Following Marchionini's example, we will take a wide view of what these terms cover: e.g. a document collection could be a set of documents in a human's mind, and the query could be a thought that the human has, e.g. `I need to provide some cross-references on topic X; which documents would be best for this?'.
Retrieval can be applied to any nature of material, but here we will assume that everything is textual, since this is still by far the most common sort of retrieval. The principles discussed here should, however, carry over to other media. We assume the end user is a human, who wants some information and hopes to acquire this information by reading one or more of the delivered documents.
We use the word application as a generic term to cover all the retrieval systems we discuss: thus an IR system is one application, and WWW and Microcosm are examples of hypertext applications.
As a final piece of terminology, which we use when talking about the web, a web presentation is a set of integrated web pages, such as the set of pages describing a company's products.
Having defined the terminology, we now present a general model of retrieval. In this model, retrieval is a process of three stages:
Actually the three-stage model is a simplification of what may really happen in practice. In particular there may be iterations involving either the first two stages or all three. One of our introductory examples -- the composite retrieval application that involved a succession of filters -- illustrated iteration around the first two stages.
Iteration over all three stages might occur if a delivered document is not a predefined document but instead is a process for creating a document, like a CGI script on the web. This process itself could involve further retrieval, as would apply if the document delivered on one retrieval was a link to a search engine that then performed a further retrieval.
Iteration over the three stages also occurs if the user is not satisfied with the documents delivered: the user may reformulate the query and/or provide relevance feedback, and as a result get a new set of documents.
After a document has been selected, there can be a further step before the document is presented to the user: the identification of fragments that might be of special interest to the user. These fragments could be occurrences of search terms within the document -- these might be highlighted when the document is displayed; alternatively a fragment could be identified by a suffix attached to a URL (thus causing the browser to jump to a certain point in the document), or, in an XML document, by a specification written in the Xpointer language (9) that dynamically searched the document's structure to find the relevant fragment.
ScentTrails, as mentioned above, represent a specific example of an application that highlights relevant fragments: in its case the highlighted fragments are the links that appear to be most relevant.
An interesting step beyond identification of fragments is dynamic generation of the retrieved document using the fragments that are most likely to interest the current user, an instance of Adaptive Hypertext, e.g. (10). Such facilities can be found for example in applications, such as Intelligent Labelling Explorer (11) and HyperAudio (12), designed to retrieve information for visitors to museums. A further example, in a interestingly different area, is the creation of personalised hypertext fiction using story fragments (13). In all these cases the choice of fragments used to construct a dynamic document can take account of the user's current context (their location, orientation, interests, preferences) and past contexts (e.g. documents previously viewed and the amount of time spent looking at them).
Another example, again entirely different, arises when a document is generated that reports on the retrieval process itself, taking advantage of information gleaned from the structure of the document collection. An example is OpCit (14). OpCit's document collection is a large set of research papers: when one paper cites another, this is treated as a hypertext link. Some users may want to retrieve individual papers, but others might want to know how many papers cite a given one, whether these citations are from leaders in the field or whether these citations just come from the authors' colleagues. OpCit can generate a document that answers such questions.
Fragment identification and dynamic generation of documents can be used in any retrieval process, and indeed offer a promising avenue for further advances. They are, however, separate from retrieval itself, and in the rest of this paper we will discuss retrieval in terms of whole documents retrieved.
Each of the above three stages of the retrieval process can be performed some time in advance of its successor, with the pre-calculated results being continually re-used by the successor stage. In the first stage either the query or the document collection, or both, can be specified in advance. An example is information filtering: here the user specifies the query in advance and this query is continually re-used for retrieval, perhaps over a period of years, until the query is re-specified.
One way of thinking about the way web pages, with their fixed links, are created is as follows: the first two stages of the three-stage process are done in advance (usually by a human author, but perhaps by automatic tools that help create web pages) and this is presented to the user as a set of links embedded in the page. The interactive user is only involved in the selection stage, where they pick the link to follow. Thus, as far as the hypertext user is concerned, it is a `Stage 3 only' process.
If retrieval is an iterative process, say of two iterations, then the first iteration can be done in advance -- typically creating a small cache of retrieved documents extracted from a large document collection -- and the second stage can be done on-the-fly. This is particularly useful for context-aware retrieval on small devices, where the domain of retrieval, i.e. the contexts to be covered, can be forecast in advance.
The advantages of doing work in advance are many: speed of response is increased, repetitive work is avoided, work may be shared across many users, data transmission charges may be reduced, and (with downloaded caches) problems with lack of connectivity can be surmounted. A prime example of the gains from doing work in advance are web search engines: these can search a billion documents in an impressively short time; two keys to doing this are (a) collecting the documents in advance and (b) pre-processing the document collection into a surrogate form that facilitates fast retrieval. A web search engine that worked entirely on-the-fly would not be a runner. In general, if an application deals with really large amounts of information, and if the application needs to deliver results in real-time, then it is imperative that some work be done in advance.
There are, however, disadvantages. The danger of doing work in advance is that the world may change and the work done in advance may be invalidated. The best known example of this is the dangling hypertext link: the author has retrieved a document in advance and provided a link to where the document was found, but when the user selects the link the document is no longer there. Doing all the stages on-the-fly greatly reduces this danger.
As our introductory examples indicated, the process of obtaining information has many dimensions, as represented by the following parameters:
Each dimension represents a spectrum of possibilities (though it is a narrow spectrum in the case of delivery if there are just the two possibilities: proactive and interactive). A common factor among all of these dimensions is the choice between being free and being constrained: whether the user has complete control and responsibility, or whether the application takes some of this control and responsibility to reduce the burden on the user.
In subsequent Sections we discuss how existing tools cover parts of this multi-dimensional spectrum, how these tools could be generalised to cover more of the spectrum, and how new tools might be designed to cover areas of the spectrum that are not well covered at present. Before discussing details, however, we will describe a general consideration: the use of context.
In all the application areas that we have described there is a continuing quest to make the delivered documents more relevant to the user. The general approach to tailoring information to a user is to collect information about that user or about other similar users who have the same needs. We call this the user's context. Context can potentially be wide-ranging (15). Sensors can detect the user's physical context: where they are, what direction they are going, what companions/equipment are nearby. A further component of context, the user's computing context, is easy to capture: what document they are reading, what other applications they are running, etc. (In terms of context capture, the document being read is of interest not only for its content, but also for any metadata or semantic information associated with it.) In many situations the document being read is the most important aspect of the user's context, since it shows the user's current focus. Wider aspects of context can be extracted from the web: the current weather and the weather forecast, share prices, traffic information, etc. All of these can have a bearing on what retrieved documents are currently relevant to the user. The user's past behaviour can be analysed, perhaps with some feedback from the user on what retrieved information was most relevant, and this analysis can be extended to cover other, similar, users whose past behaviour may be a guide to the current user (as in peer-to-peer search engines). More generally the context can encompass user models and task models.
Overall the context consists of many different items -- we call them fields; at any one time some fields may be irrelevant to the retrieval process whereas others may be highly relevant. For example a field representing the user's location may currently be important, whereas share prices may not. Thus we have a concept of a weight attached to each field, and these weights can change dynamically according to the user's needs. Weights are typically set by automatic tools, since it would be a burden on the user to continually set and update them.
The main advantage of context is that, if it is collected automatically and then used effectively, it can bring big improvements to the relevance of the documents delivered, all without any extra user effort. There can, however, be further gains if the user does make an extra effort, by telling the application something about their nature and needs, e.g. whether they are a beginner or expert, what topics they are currently interested in, and where they are travelling to. This can be done as a occasional activity, in advance of the retrieval requests.
For all types of context it is useful if the application maintains a history, e.g. a trail of locations visited, or documents previously viewed. This `history' may encompass the future too, as derived, for example, from future diary entries. An example is that the user's diary might say they should be at a certain location in two hour's time; this element of context might have an effect on the user's current information needs. Context can be further enriched by guessing higher level contextual states from the values of low-level sensors. For example Pepys (16) used active badge information to detect whether a meeting was taking place (several people converging to the same place at roughly the same time, and then staying there). Being in a meeting is an important factor of a user's context, and should affect which documents are delivered to them, and, indeed, the user interface for delivery and selection.
Overall, therefore, the context can represent a rich resource for helping to find what information the user is interested in. In our introductory set of examples, the one labelled `context-aware retrieval' showed the manifest exploitation of this resource, but context could be valuable in many of the other examples too. Exploiting physical context is probably easiest, e.g. the location of a mobile user. Matching of locations is easy (though not trivial, for instance when the user is constrained by streets) and reliable in the sense that all the documents whose associated locations are close should be delivered, and all those that are far away should not. At the other end of the scale, looking at the contextual history of a user's past retrievals, and making this influence the current retrieval operation is hard and potentially unreliable -- occasionally it will lead to irrelevant documents being delivered or relevant documents missed. However most web browsers provide a small step in this direction: they keep a history of links followed, and, each time a link is displayed, display it in a different colour if the destination of the link has already been visited. The web browser does not make any judgement on the relevance of this information: it just tells the user and lets them judge. It would also be relatively easy for a web browser to test if the information at the link destination has changed since the user last viewed it, and alert the user if this is so.
Frequently it is the current context, rather than the historical context, that is most relevant, and this context may be changing quickly. Often the user's current context cannot be known accurately in advance -- and certainly not well in advance: in these cases context is best used in retrieval processes that are on-the-fly.
There is, however, one case that is exceptional: as we have said, in a hypertext system, the author, when embedding links in a page, knows that the user will be reading that page when she selects the links -- thus the author knows in advance the document (i.e. web page) being read. This is, of course, part of the user's context. If the page is part of a hierarchy or a sequence of pages, the author may further deduce how the user reached the current page, and what documents they saw on the way; in addition the overall nature of the web presentation to which the page belongs may be a guide to the nature of the user (e.g. the presentation may be `Mathematics for the conceptually challenged'). Such contextual information is routinely used by hypertext authors (`if the user is reading this page, they are assumed already to know about X'). This case is exceptional for a second reason: the same context applies to all users, whereas generally a contextual field is tied to an individual user.
Having discussed the issue of context, we will now look more closely at the components of the three-stage retrieval process. This discussion occupies a substantial part of the rest of the paper.
A fundamental issue is the structure within the documents to be retrieved. In information retrieval the data is typically unstructured or semi-structured. If information is fully structured, in the sense that each document is divided into exactly the same fields, then database technology is typically used. Although not totally structured, IR document collections are, however, often divided into fields, and the query may refer to individual fields, e.g. `Match XXX in the Author field and YYY in the Title field', or, with a context-aware application, `Match XXX in the Location field and YYY in the Time Field'. As we observed when discussing context, different fields can have different weights, e.g. that Location is twice as important as Time. Moreover these weights may change dynamically, e.g. in a context-aware application for tourists, the field whose value has changed most since the last query may get the highest weight.
The design and issuing of a query (together with other aspects such as the specification of weightings and of the document collection to be used) can be done by the user, or can be done wholly or partly by an assistant. The assistant can be a human or a program. In the human case, the assistant's work can be done in advance, as with an expert author who has designed how readers will retrieve information, or can be a person who interacts directly with the user -- indeed they could be sitting side by side. The assistant may design the query and/or may decide when to issue the query.
The user may want the assistant to specify the query because:
Often the specification of the query is done jointly by the user and the assistant: the user first specifies the query, but, behind the scenes, the assistant enhances it to factor in additional considerations. Actually, if we think in implementation terms, we may be over-simplifying how the assistant works: it may be more convenient to implement assistants as components in a pipeline of retrieval operations, rather than as contributing to a single monolithic query.
The document collection may be fixed by an application. Alternatively it can be specified by the user or by an assistant, e.g. by an intelligent resource-discovery agent that finds the most appropriate document collection on-the-fly. An example of such an assistant would be an agent that found a document collection that gave tourist information about locations that the user was close to or was heading towards.
Even if the nature of the document collection is known in advance, its content might not be; this would apply, for example, to a document collection about current traffic problems. Most IR applications, however, depend on knowing the content in advance.
After the query has been specified, it is subsequently issued to the retrieval engine. In any push technology, it is not the user who issues the query; instead the the retrieval engine itself does it. Push technology has, of course, existed for a long time, well before electronic pushing became viable: for example a librarian might perform Selective Dissemination of Information (SDI) by sending notes to users when new information becomes available. Lessons learned from this, and indeed from the work of librarians in general, carry over to the present day and may help prevent re-inventing of wheels that are less round than their predecessors (17).
In some applications that use push technology, such as Information Filtering (IF), the query is still designed by the user. Here the reason that control at Stage 2 is taken from the user is that they would not know when to issue the query (e.g. in the IF case they do not generally know when each new document arrives), and, even if they did, it would be tedious to continually issue the same query.
Proactive context-aware retrieval (CAR) systems are similar to IF systems, but the queries are associated with each document; they are prepared in advance by an author, not by the user. For example a document associated with a garden might have the associated query `is the user's location near the garden, and does the time correspond to the garden's opening hours'. The query attached to a document can be regarded as a form of metadata; indeed the query need not be explicit, but could be automatically derived from metadata attached to the document. For example the document associated with the garden might have some `requirements' metadata that has two fields: a location and a time. In addition the nature of the dynamic elements in CAR is different from IF. In proactive CAR the document collection may well be completely static (whereas it is dynamic in IF): the dynamic element is the user's current context, against which the queries attached to each document are matched. Typically CAR is automatically performed whenever there has been some significant change in the user's context (the context typically includes time, so one criterion for a new retrieval can be that time has advanced by a certain amount).
There are CAR systems that are interactive rather than proactive, but their operation is basically similar. Overall the important point about all these cases is that change (a new document, a change in the user's context) causes retrieval to occur, and the assistant may know much more about change than the user, and may thus be able to facilitate the retrieval of documents that are relevant to the changed circumstances.
Table I shows the properties of some existing applications. Clearly, since systems vary, the table can only give an overall impression rather than a definitive statement for every existing system. Within the table we use the suffix `/Adv' to mean `is (or may be) done in advance'. The `User' means the end-user. The table row labelled `Generic link' describes the facility first offered by Microcosm (18); a generic link creates at run-time a link from a word or phrase to places where more information about the word or phrase can be found.
Query | Document collection | Push/Pull | |
---|---|---|---|
IR | User | User or (Application/Adv) | Pull |
IF | User/Adv | (User or Application)/Adv | Push |
CAR | Assistant | User or (Application/Adv) | Push or Pull |
WWW link | Human/Adv | Human/Adv | Normally Pull |
Generic link | User | Human/Adv | Pull |
Autonomous agents | User/Adv | Assistant | Push or Pull |
The `Document collection' column refers to the nature of the document collection (e.g. Traffic Reports for London) rather than its content. However, for IR applications that deal with huge numbers of documents, the content needs to be known and pre-processed in advanced. Thus although the user may be able to choose the document collection, the choice will be confined to those whose content has been suitably pre-processed.
Researchers are always trying to break the mould, and the typical properties embodied in Table I may well be superseded in future. This applies in particular to the hypertext model of fixed links, crafted by an author in advance, embedded in a hypertext document. Instead links can be stored in a linkbase, separate from the document(s) they apply to; the user might be able to choose between different linkbases, and links might be created dynamically and/or adapted according to the user's profile. An interesting example of how far this process can go is provided by hypertext-augmented reality (19). Here the user can cause a 3D object to appear in their augmented reality -- the example quoted is an image of an aeroplane that appears to the user to be the size of a model aeroplane -- and can cause links to be superimposed on this object. For example a label might be superimposed on the engine of the aeroplane, where this label represents a hypertext link to a description of the engine (either generic to aircraft engines or particular to the engine of that individual aircraft -- in general it is a challenging retrieval problem to know whether the generic or the particular is more relevant to the user). The user specifies the types of link that they want. They do this with simulated salt and pepper pots that they can pick up and use to sprinkle links onto the object. For instance the salt could represent technical information, and the more salt that was sprinkled onto the aeroplane, the more technical information would appear. Pepper and other condiments can support other sorts of link, and perhaps different document collections to provide information.
Overall the effect is to move hypertext away from its static, stage 3 only, slot. Instead there is dynamic selection of the document collection, and, more importantly, the link structures to be used, albeit within the constraints of what authors have provided.
There is usually a degree of uncertainty on whether the delivered documents will really be of interest to the human end-user -- if there is no uncertainty the selection stage is irrelevant as the retrieved documents can be delivered direct to the user. The selection stage should give the user as much help as possible in resolving this uncertainty. There are two standard ways of doing this, and both can be used together:
Overall the ranked list and the labels provide an opportunity for the application to explain to the user why each delivered document may be relevant to them. Obviously this is easiest in a hypertext system, where the documents are known in advance and a human author writes the labels and provides any ranking (`Here are six papers, in order of increasingly complexity, that explain more'). However there are also opportunities for automatic systems to provide further information -- generated on-the-fly -- to users (`this garden is very close to your current location, is open, and matches your interest in conifers; you have not apparently visited any gardens on your current trip'); such opportunities are not widely exploited at present, but, we believe, could be an important part of the success of an overall system.
Some hypertext systems support one-to-many or even many-to-many links, rather than the traditional one-to-one link. In terms of the model presented here these are not fundamentally different from one-to-one links: they just offer a richer user interface for selection.
To recap the two extremes presented at the start of this paper, an IR system offers the ultimate in freedom: the user has control over the complete retrieval process (except perhaps for the document collection to be used). A web page, on the other hand, is in retrieval terms a highly constrained system: the author has done all the retrieval work in advance, and the only choice the user has is to select one of the links provided. If a constrained system suits the user's needs, this is ideal: the user has been saved a lot of work by the author who successfully constrained retrieval to cover just what the user needed.
As we explained earlier, the premise of this paper is that unfortunately no application will meet the needs of all users in all situations. In particular sometimes the user will want less constraint, and sometimes the user will want more help from the application, often in the form of constraining a large number of possibilities into a smaller number, better tailored to the user. Thus many applications provide extra mechanisms, either to remove constraints or to impose them, thus improving the application's versatility.
One approach to removing constraint is as follows: the user is viewing a web page, and wants other information, which is not covered by the links provided. The user then selects, from within the web page, one or more words (or passages) that are of especial interest, and hits a button called `Retrieve' or the like. Documents relevant to the selected words (and to the document the user is currently reading plus other context) are then retrieved and delivered. In some applications the document collection from which retrieval takes place may be different from the original one, i.e. it is a subsidiary document collection. The subsidiary document collection is typically a limited one, especially tailored to the material in the web presentation currently being accessed. For example the subsidiary document collection may take the form of a dictionary: if the user selects a word in the current web page, and if that word is in the dictionary, the dictionary entry is displayed. The dictionary need not be a comprehensive one: it could just be a glossary of special terms used in the web presentation, or of topics for which there is further information (like generic links in Microcosm). Alternatively instead of offering a dictionary -- a highly constrained and focussed artifact -- the application might offer an opposite extreme and search the whole web for the words the user has highlighted.
The above process, based on finding documents related (a) to the document the user is currently reading and (b) to an individual user's context, can be automated. There are many systems, some of them commercial products, that do this. One class of these is the Just-in-time Information Retrieval agents of Rhodes and Maes (22). These proactively create a retrieval query based on what the user is currently reading and/or writing, and on the user's context (which includes past history). This query leads to the retrieval of some documents, hopefully highly relevant to the user's current activity, and these documents are presented discreetly (and discretely!) to the user so that their sudden arrival does not unduly interrupt the user's current task. Xlibris (23) also lies in this class: Xlibris has a pen-based interface, and one of its capabilities is to perform a retrieval search based on "ink" marks drawn with the pen by the reader in order to annotate the document they are reading. The aim, which is similar to the hybrid applications we discussed earlier, is to add a broader view to a constrained process: ideally this can lead to chance discovery of some relevant documents outside the user's normal fields of perusal.
The spectrum between freedom and constraint also has an impact on how relevance feedback is implemented: it is relatively easy to provide user facilities of the form `this is too general; I only want the sort of material that the following documents cover'; it is in general harder to allow the user to ask for a release of constraints, not least because (a) he may not understand what the current constraints are, and (b) he may not know what the world is like outside these constraints.
We now briefly look at freedom and constraint from a more theoretical standpoint. Direct hypertext links as found in WWW represent only one of a large number of possible types of link. Of course, WWW offers other types of link, such as links to CGI scripts, but these still represent a subset of the possibilities. A much wider classification of links is provided by DeRose (24) (also see (25) for a more formal analysis of links). In DeRose's classification there are two overall types of link: (a) an extensional link, where the link is essentially an ad hoc connection to one or more possible documents, and (b) an intensional link, where the link is derived from executing a function. One example of such a function is a CGI-script, which in effect creates a new document on-the-fly and links to it. Another is what DeRose calls a retrieval link. A retrieval link creates, on-the-fly, a link to some existing document (which may come from some restricted collection of documents or the whole `docuverse'). DeRose's classification represents a theoretical description, rather than a taxonomy of existing systems. In principle, however, the function that drives an intensional link can do anything, and can cover all the possible functionality we have described here. Hence it could provide any degree of freedom up to full IR.
Up to now we have spoken in terms of a single context for the user, but this is simplistic. Most of us wear several hats; we may be a researcher in X and Y, a teacher in X and Z, an administrator, a hobbyist, a traveller, ... . Thus we have several possible contexts, and continually switch between them. Contexts typically consist of an aggregate of several components, which we have called `fields' -- to parallel the fields within documents. Different contexts may share some contextual fields. For example the current time will be the same for nearly all of them: however an exception to this would occur if the traveller set their time to a pretended value, representing a future time at which they plan to travel. To meet the need for multiplicity, an application can maintain not one context, but several.
In this final part of the paper, we postulate one piece of novel infrastructure plus some applications that cover wide parts of one or more dimensions of the spectrum that we have explored above. Our postulated applications represent combinations of existing applications, e.g. C + D + a dash of E and F. In such cases we will somewhat arbitrarily take one component as a starting point, C say, and build from there. A key to these applications is a piece of infrastructure that we will now describe; its purpose is to give the user greater freedom to manipulate documents they are reading, such as web pages, and, as a beneficial side-effect, thereby to allow a computer program, by looking at the user's manipulations, to understand better what the user is interested in. This is the read/write interface.
We have said that an important part of the context is the document the user is currently reading or writing. For a read/write interface we assume that the tools for writing, such as text editors, also provide reading: obviously the user can read what they have written, but in addition almost any editor allows the user to import other documents -- thus the editor can be used as a crude file-browsing system. More unusually we assume all reading tools, such as web browsers, allow annotating (i.e. writing) too; for example the user can change words in the current document they are reading, add new words, delete words, etc. (These annotations may be ephemeral -- being lost when the user moves to a new document -- or they may be preserved in some way. We are not assuming a read/write interface offers all the power -- and complexity -- of a full authoring system.) Overall we have a concept of the current document of interest, which the user may read and write. One extreme is where the document being read is null, and thus the user is writing a new document; the other extreme is where a passive reader reads an existing document without annotating it. The most interesting cases come in between. In addition we assume that the user has a facility for feeding back their level of interest in each component of the current document: at its simplest this can be a facility for the user to highlight the words or sections that most interest them -- here we may have a simple Boolean division, where highlighting means "very interested" and lack of highlighting means "not very interested". Further we assume that the current document can contain hypertext links. Finally we assume that details of the current document, such as what parts the user wrote and what parts they have highlighted, are available to outside applications. Overall we feel that annotation is a natural aid that any user can employ to help their reading of the current document; if this annotation is known to other tools, which can as a result retrieve documents that are more relevant to the user, it is an almost free bonus. We can extend the concept of the document of interest to allow several simultaneous current documents of interest if we need to.
We believe that the read/write interface is a simple and powerful aid to allow the user to influence the retrieval process. We now postulate the applications. They exploit a read/write interface, and they also cater for multiple contexts, as introduced in the previous Section.
Our first example starts from IF (Information Filtering), and our postulated application is called SUPERIF. As with normal IF the query is supplied in advance. Instead of working from a pre-defined document collection, SUPERIF uses a resource-discovery agent to find documents that meet the users' needs, as given by their current query. When a user first employs SUPERIF, he may optionally choose to set an initial query; in any case whatever query the user supplies is automatically supplemented by SUPERIF. This supplementing is done by a process of deduction from looking at each individual user's retrieval behaviour, and evolving their query continually (e.g. daily or weekly). Even in our make-believe world, however, it would be unrealistic to expect this process of query deduction and evolution to work well all the time. Thus the user will sometimes want to intervene, and to modify the query that has been automatically constructed for them; hence the application must be able to present the query to the user in a comprehensible and easily changeable form.
In line with our earlier comments about multiple contexts, SUPERIF creates several queries for each user, one for each hat they wear. SUPERIF has proactive delivery of information, but is radical in the way it delivers retrieved documents. It only delivers documents when the user is detected as being in a context where `they have the right hat on'. Thus if a query related to research papers, this would be delivered when the user next read or wrote a document that related to that activity. (There might be some special process for urgent documents: e.g. delivery with any hat on.) SUPERIF presents the set of delivered documents in a read/write interface, so that the user can make annotations or changes to aid selection or subsequent retrieval.
SUPERIF should cater for both short-term and long-term retrieval needs. Movement on the spectrum between short and long term can be achieved by adjusting weightings in the use of context. Thus for short-term needs the current context has a higher weighting than history. For long-term needs the current context is less important, and history more important.
Our second application, SUPERIR, starts from IR as a base. SUPERIR only does context-aware retrieval. Since part of the context is the current document of interest, the user can simulate a current search engine by creating a new null document and just typing some search terms into it. (At an extreme they could ask for all other aspects of their context to be shut out, thus relying solely on the search terms they have just typed.) SUPERIR can be used as a web browser, and indeed a typical pattern of usage may be as follows: the user loads a document, and, if this is a hypertext document, perhaps follows a series of one or more links to further documents: when the user feels they want a broader perspective, they mark the parts of the current document that are especially relevant to their needs, and also perhaps add some annotations: they then hit a "Retrieve" button to cause SUPERIR to find some more relevant documents. SUPERIR, like SUPERIF, caters for multiple contexts. SUPERIR is geared to the user's short-term needs, and to cases where the query is not supplied in advance but interactively on-the-fly.
In designing our two hypothetical applications, we could have used various criteria for distinguishing them, e.g.: (a) one for short-term needs and one for long-term, or (b) one for proactive delivery and one for interactive. We do not, however, believe that either of these is fundamental. Instead the criterion we have used is whether queries are specified in advance, or whether they are supplied interactively on-the-fly. In a sense this criterion is not fundamental either; however we believe it is fundamental to having any chance of an efficient and hence usable application that caters for a large number of documents. Thus we believe that an implementation-based criterion must, at least in the foreseeable future, take precedence over others. The criterion is part of the difference between IR and IF, and the whole key to each of these has been to provide optimisations based on the parts (the queries or the document collection) that are known in advance. To be practical we think therefore that SUPERIR, which has dynamic queries, would have to have some advance knowledge of the document collections that might be used, and their content. Obviously, however, there will be specialised applications outside the scope of SUPERIF -- e.g. a traffic information system in which all documents are indexed by location and content is changing continually -- where high performance needs to be achieved in a totally dynamic world.
Obviously many variants of SUPERIF and SUPERIR are possible, but we believe that more important than the tools themselves is the underlying read/write interface. Perhaps the key to a whole range of advances is to escape from the legacy that reading documents and writing documents are separate activities.
At the start of this paper we quoted the view -- a view widely held -- that, in many real situations, following hyperlinks is too restrictive whereas using a general IR search is too permissive. The user wants something in between the extremes, but there is an added, sometimes implicit, requirement that this something must require no more user effort than the extremes, and ideally should require less effort. A key to achieving this is automatic processes that select or enhance the query, choose the document collection, and perhaps proactively deliver documents. The most natural way to accomplish this is to collect and exploit information about the user's context, and to use this in the automatic process. A further aid, affecting both convenience and performance, is to perform some stages of the retrieval process in advance.
Automatic processes, intended to work on the user's behalf, are, however, a two-edged sword. If the automatic processes deviate from the user's real needs, then, since the average user is unaware even of the existence of the automatic processes, he will have great trouble in correcting the problem. More generally, we have all had problems trying to tame too-clever-by-half software. A key to improving this is for the software to supply the why as well as the what. In terms of our retrieval model this is especially relevant in the construction of queries (see our above suggestion that SUPERIF explained why its queries had been constructed) and at stage 3: the stage where the user selects from the documents that have been retrieved. As well as presenting what has been retrieved, an application that uses a lot of automatic processes needs to explain, on option, why each document has been retrieved , e.g. `this document describes the XXX museum; it relates to your [deduced] interest in YYY, and represents a suggested afternoon activity given the forecast for rain'.
Currently we have a plethora of retrieval tools, each representing a point on a spectrum. The future surely lies in (1) making these tools work together in a seamless way, and (2) making each tool cover a wider part of the spectrum. A ubiquitous adoption of a read/write interface is, we believe, an aid to (1). As regards (2), we have proposed two tools, SUPERIR and SUPERIF, to this end; each knows some information in advance, and this offers a chance for a practical and efficient implementation that caters for document collections of a realistic size.
Finally, as well as looking at tools -- which has been a focus of this paper -- we need to look at fundamentals. We need to understand a human's searching strategies, for example by following and expanding Marchionini's models. Moreover we need to think about document models: reading and writing need to be treated as complementary, and our suggested read/write interface is aimed as a start to achieving this. Our current document models draw far too many unnecessary divisions: between different types of application, between reading and writing, between paper and electronic form. Radical new models, combined with good engineering of applications, are a key to a newer generation of software tools that are much less constrained than the current generation.
A lot of new insights relating to this paper were provided by Wendy Hall and Les Carr, and their colleagues at Southampton University. Douglas Tudhope helped hugely both with initial drafts and the final version, and gave pointers to relevant areas of research that were new to me. I am also grateful to two anonymous referees.