document converted to HTML form

[[page created automatically from word-processed document; for original see: Postscript version]]

(continuation slide:)

From search engine to hypertext:

a spectrum of possibilities

Peter Brown

Computer Science

Univ. of Exeter, UK

Background to work

Links with Southampton University.
These are early ideas.

Extremes of retrieval

Unconstrained: search engine
Constrained: a hypertext page including, say, 5 links to other pages that supply further information.
We often want retrieval between these extremes ..
.. especially when using a small screen.

Points on the spectrum

1: Full IR, e.g. search engine.
2: Information Filtering, which is like IR but:
- user's query is supplied in advance.
- delivery of documents is proactive.
3: Site search on the web; more generally searching a sub-hierarchy.

Points on the spectrum (cont.)

4: Context-aware retrieval:
- may be proactive, e.g. delivery of information as tourist nears a site.
- may be interactive, but query is wholely or partly supplied automatically.
5: User-tailored retrieval:
- similar in principle to previous, but with tailoring of information to a user (e.g. preferences, past history, bookmarks).
6: Dynamic hypertext links:
- source and/or destination of links are not fixed, but are calculated by a function.

Points on the spectrum (cont.)

7: Composite retrieval, e.g.:

(1)
do an ordinary retrieval.
(2)
apply one or more `filters' to the results of (1).
Perhaps carry this further and have several autonomous retrieval agents + a consolidator.
8: simple web links:
- the retrieval has been done in advance by the author, who has provided links to the most relevant documents.
- author has a piece of context: the web page the reader is looking at.

Underlying mechanism: a three-stage retrieval process

(1): specification of (1a) document collection and (1b) query.
(2): issue of query to retrieval engine; delivery of relevant documents.
(3): selection by user of a retrieved document.
... plus possible further step of marking relevant document fragments, and explaining why each is relevant.

There may be iterations round these stages, e.g. building and using a cache involves iterating over (1) and (2) above.

Timing

For large amounts of information, some elements of (1) to (3) above must be done in advance in order to get acceptable performance.
.. but subsequent change then causes problems.

Dimensions of the spectrum

Queries: permitted forms, who specifies, who issues.
Document collection: who specifies.
Iterations.
Timing: which stages done in advance.
Delivery: proactive/interactive.

Most dimensions involve a choice between freedom and constraint.
Current applications typically pick certain points on the spectra, but provide some mechanisms to ease/impose constraints.

Example: easing of constraints within hypertext system

Use of associated glossary or dictionary.
`Find me a web page like this one'.
Just-in-time retrieval agents: annotating the current document with automatically-created links to relevant material; also Autonomy products.
More generally, using intensional rather than extensional links.

Proposal: the read/write interface

Do not distinguish between browsers (for reading) and editors/word-processors (for writing). Thus you have a browser that allows annotation, deletion, highlighting, etc., by the user.
Current read/write document is called the document of interest, and is one part of the user's context.
The document of interest can be used as a basis for retrieving further information the user wants.

Proposed future application: SUPERIF

Information Filtering, but documents are found by a resource-discovery agent, which knows about user interests.
Uses read/write interface.
Queries are adapted automatically according to how the user reacts to the documents delivered; thus queries evolve slowly over time.
Caters for different user `hats'.
Aims to provide proactive delivery of all the new information the user needs.

Proposed future application: SUPERIR

SUPERIR is an IR system that only does context-aware retrieval (but document of interest is part of this context).
Uses read/write interface.
Has a set of predefined document collections prepared in advance (like set of web pages used by search engines).

Difference between SUPERIF and SUPERIR

Query known in advance vs. document collection known in advance.

Conclusions

We have tried to find automatic processes that help find relevant documents without user intervention.
One key to this is the use of context.
Another key is the read/write interface.
Some operations must be done in advance.
The more automation there is, the more a user needs a facility to find out why a particular document has been delivered.