Grand challenge: integrating document reading and document writing

P.J. Brown
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
e-mail: P.J.Brown@exeter.ac.uk

The challenge

When we read a document we often want to write, for example to annotate the document. When we write a document we often want to read documents, for example we might want to be prompted about existing documents that are related to the one we are composing. Reading and writing should therefore be treated as complementary activities. Current computer science does not reflect this: we think in terms of reading tools, such as browsers, and entirely separate writing tools, such as editors or word processors. In terms of implementation, reading tools offer at best crude facilities for writing, and writing tools offer at best crude facilities for reading.

Moreover there are further dichotomies: we regard paper documents as intrinsicly different from electronic ones; even within electronic documents there are separate classes such as e-mails, web documents and word-processor documents, each with separate underlying philosophies and supporting tools.

The grand challenge is to rethink our approach to documents so that (a) reading and writing are integrated activities, and (b) all forms of document are covered.

If we meet this challenge, and implement corresponding software tools, we will have made an advance that benefits virtually every computer user.

Document model

A requirement for realising the challenge is to create new document models. The challenge implies that all components of such a model must be first class citizens. Thus anything can be written, anything can be realised electronically or on paper, anything (including an annotation) can be annotated, anything can contain hypertext links, "mark-up" can be changed or annotated by the reader in the same way as content, everything has provenance including a record of who created it and when. The model must not only cover individual documents but collections of documents with links between them; the linked structure (which in implementation might be partly created automatically, e.g. by interlinking a collection of e-mails) should apply to all classes of document. Like any good model, the model must capture abstractions rather than realisations. This general model must be realisable in real systems that real people can use.

Hardware

The challenge requires advances in hardware as well as software. Relevant hardware includes (a) cameras, (b) projection equipment and (c) paper-like displays. At present good progress has been made with (a) and (b), but (c) remains a dream. Further progress has been made at exploiting the available hardware, e.g. by Xerox and by Cambridge University.

Change

Two important aspects of documents are change and management of who can make changes. When a document changes and the reader decides (or is constrained) to use the new version, then previous annotations, links, etc., will not always work perfectly. An aspect of the grand challenge is that their behaviour should, however, degrade gracefully as the document changes.

Many original documents will be read-only as far as a particular reader is concerned. This typically applies, for example, to library books (we trust) and to electronic documents written by others. Nevertheless the reader should be allowed to change and annotate such documents, without, of course, destroying the original. Such changes may be personal to one reader, or shared among a collaborative group of readers. For paper documents, the changes might be projected onto the original. Thus every time you turn a page of the book (provided it is under the projection system) your previous annotations -- made, say, with an electronic pen -- are projected onto the newly revealed page.

The nature of annotation

The purpose of annotation covers three aspects: (a) indicating the user's level of interest in each part of the material -- this may be displayed using levels of highlighting, or, for non-interest, deletion; (b) adding commentary; (c) adding the user's personal links to other material (e.g. `These results [a hypertext link] seem to contradict this'). It is important that the document model can capture the concept of annotation, like other document concepts, in its abstract form rather than as some way of representing it. Thus (a) above must be captured as `this sentence is of the highest possible interest' rather than `display this sentence in 18-point italics'. Many current documents do have a welcome emphasis on abstractions, but there remains a need for fundamental rethinking of models.

Reading

We have probably made more progress with tools for reading documents than tools for writing them. Nevertheless many parts of the grand challenge need work:

Allowing the user to capture of all or part of a document for subsequent use: this is currently hard for paper documents, though there has been progress in OCR and digital cameras, and hard to do properly for extracts from electronic documents because of the possible presence of global declarations.
Allowing the user to create a link to all or part of the document being read, or perhaps to an annotated version of it.
Providing the user pre-emptively with suggested documents to read: thus if a user is reading a document and has indicated a high level of interest in some parts of it and has changed other parts, the system is well placed to understand the user's needs and to provide documents that meet these needs. This also applies when the user is composing a new document: the user can be shown documents that are relevant. These may relate to the content of the document or elements used within it. An example of the latter occurs when a person's name is used, for example, as the intended recipient of an e-mail or as the author of a paper to be cited. The system can provide previous e-mails received from this person, or their documents that have been saved or linked to by the current user. (Obviously such pre-emptive behaviour must not be too intrusive, and requires a good interface -- such as is provided by the work of Rhodes and Maes .)

Impact

I believe that this proposal meets the majority of the criteria for a grand challenge (see the companion document for details). Its especial strengths are that it relates to everyone, it represents a radical paradigm shift, well away from directions of current commercial development, it gives scope for ambition both in underlying models and in engineered solutions, it brings together many disciplines ranging from hardware to user interfaces, and finally it brings together the disparate worlds of paper and electronic documents.