LAP: discussion and practicalities

P.J. Brown
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
e-mail: P.J.Brown@exeter.ac.uk

ABSTRACT

Todo

Introduction

We have built a prototype implementation to explore some of the ideas behind lifelong annotation. The prototype is called LAP (Lifelong Annotation Prototype). This document is in three parts, which discuss, respectively, (1) some of the design decisions behind LAP; (2) the practicalities of using it; (3) the features oriented to use over the lifetime of the user.

PART 1: some design decisions

Simplifications

In order to keep this project within bounds we made some simplifying assumptions:

the annotated files are in HTML or its derivatives.
neither the name nor the content of the name (URL) of the annotated files changes.
the annotations are for personal use.

Nevertheless we are conscious that there must be a possible route ahead to go beyond these simplifications. In detail these routes are:

our approach involves annotating the textual source form of the underlying document. This must be easily human-readable (even HTML prevents problems here, but notations like Postscript are very hard for humans to read). Although our approach is oriented to HTML it could easily be used with other human-readable textual source form, and which possess a rendering tool (e.g. browser) that provides some way for the annotations to be presented -- in the case of HTML we use special colours are markers to make the annotations stand out.
there are some promising approaches to making annotations robust over change -- see below.
annotations can easily be shared, even if there are no facilities for collaborative group annotations -- again see below.

Software used

Ideally annotation should be performed within the user's favourite browser. This is not readily achievable now, and LAP is a separate, independent, tool. It is implemented using the Guide hypertext system, which has the advantage of having a mechanism -- albeit primitive -- to give data types to annotations. Guide also supports a multi-level mechanism where higher levels represent additions to the lower levels -- ideal for annotations. LAP works on the HTML source of a web page; this has a disadvantage in readability, but an advantage that all elements of the mark-up can potentially be annotated (e.g. the link represented by "HREF=..." in HTML). At the moment the advantage is largely a potential one, as LAP concentrates on textual context within the body of an HTML document. LAP runs on Solaris, and uses Netscape as its browser. LAP uses Netscape's remote control facility to take from Netscape an HTML file that it has loaded, and to send to Netscape a file with annotations in.

A typical sequence to annotate an original web page is.

load the web page into LAP; it will be displayed in HTML form.
add the annotations.
send the annotated file back to the browser to be displayed. When this happens, the annotations are translated behind the scenes into HTML, and integrated into the HTML of the original file. The resultant HTML file, the consolidated file, is displayed by the browser.

Annotation files

We use the term underlying to describe an original web document that needs to be annotated. A set of annotations is stored in a file with the suffix .ann; this file is separate from the underlying web document (which may be anywhere in the world), but contains a reference to it. Thus, given a .ann file, software has enough information to display the annotated web page. The different types of annotation have names (e.g. Memorable-quote or For-my-thesis), and each type has properties. Users can define their own annotation types. In Guide annotations are represented by a facility called contexts, and the set of definitions of contexts will be created in an internal Guide file called contexts.guide. All .ann files are stored in an internal Guide format; this need not concern the average user (any more than the internal format of a Word .doc file).

Robustness over change

Robustness over change is not an aim of the LAP prototype: for example the position of annotation anchors is given simply by character offsets. In general, change may occur in content, or in the URL itself. In some application areas, particularly commercial web pages, changing content is a key issue -- the average page may change once a month (Fetterly et al, 2004). Our chosen area is annotating technical papers, and here change of content is much less frequent -- indeed if a copy of a published paper is put on the web it could be argued that this should never change, but should remain exactly as published. Change of URL is an endemic problem on the web, and we have made no specific attempts to tackle it. There is, however, hope for the future in the guise of lexical signatures {Phelps & Wilensky, 2000). Change of content is an issue tackled by several past annotation projects (see, e.g., Todo). Though complete and utter change will inevitably knock out any approach, existing approaches have achieved robustness over limited change.

Sharing with other users

Our project is not specifically about collaborative annotation, and all the groupware controls needed for this. However we certainly want to support users sharing their annotated documents with others. We will assume here that recipients have access to the necessary software.

There are several cases of sharing, depending on the nature of the recipient. The simplest case is where the recipient is at the same site as the sender, and has read access to the sender's files. The recipient can obviously then view the annotated files (by accessing the .ann file), and, if he wants to add his own extra annotations, make a copy of the original .ann file, and then add to this.

A second simple case is where contact with the recipient is via e-mail: the annotation file can be attached to the e-mail (plus supporting files that define the annotations, such as contexts.guide). If the underlying file is not an unchanged web page, this can be attached too.

In addition to these cases, there is an obvious need for the sender to be able to mount his annotated page on the web, so that the world can receive it. This can simply be done: just save the HTML source of the annotated file (we call this the consolidated file above) and make this web-accessible (or alternatively provide access via a CGI script that creates the page on-the-fly from the latest annotation file -- see further discussion below). Recipients can certainly then load this HTML file, and add their own annotations if they wish. Problems do, however, arise if users want to change the previous annotations, or integrate them with their own annotations into one annotation file. The problem is the consolidated file contains translated annotations, i.e. the original annotations have been translated into HTML constructs, such as SPAN, to effect them (though the user can, of course, edit previous annotations in their translated form). To tackle this problem we have made the translations reversible, and have provided a tool (TODO its name) to take an annotated document in its HTML form and get back to a source annotation file together with an underlying file which is functionally equivalent (ideally absolutely equivalent) to the original web page. The recipient then has a uniform set of annotations, independent of the the original author's original annotations. (This is the opposite of groupware, where the sender and recipient would work in tandem, combining their annotations.)

The semantics of annotations

When annotations are shared, the recipient likes to know the meaning of each type of annotation, e.g. what was the original author's purpose in defining annotations of type EPSRC-project? (The original author of these annotations may even have the same question, especially when looking at annotations created long in the past.) The Guide method for attaching properties to annotations types is concerned with presentation, not semantics. (When this information is translated into HTML, CSS style sheets are used and the same applies.) Generally capturing semantics is a challenging problem in any branch of computer science. In this project we avoid such challenges by using an informal approach. The convention in Guide when defining new "contexts" -- these are used to represent each type of annotations -- is to create a file, called contexts.guide; this file not only defines fonts, colours, etc., for each context but also provides examples of each context defined, with a textual description of its purpose. (Copy this to the translated file?)

PART 2: the prototype implementation

If one looks at annotation systems currently in existence, they provide a wide variety of different user interfaces. In fact no one interface is suitable for all types of annotation, and any interface is a compromise. The compromise currently adopted by LAP is that, in the HTML form, the presence of an annotation is shown by a small icon; passing the mouse over this icon reveals the body of the annotation (in fact in HTML it is the ALT string on the IMG that shows the icon). This compromise works well with short annotations, but not with long ones or multimedia ones. The anchor of the annotation (if not null) is shown using the SPAN construct in HTML. Properties such as colour on the CLASS (i.e. data type) of the SPAN are used to make the anchor stand out.

An HTML file can in fact have several independent annotation files (e.g. produced by different users); currently these really must be independent -- there is no way of showing two separate sets of annotation together. LAP automatically saves all underlying HTML files in a directory called tmp_saves, a directory immediately under the user's home directory. The user is, however, free to delete any or all of these underlying HTML files. If LAP finds an underlying file is missing it automatically re-loads it from the web.

Annotations are positioned by means of character counts (e.g. an annotation comes after the 123rd character of the HTML file). A defect of this crude mechanism is, of course, that if a re-loaded file is different from the originally annotated one, then annotations get out of place or worse (e.g. they unintentionally corrupt the HTML mark-up).

Annotation anchors may contain mark-up, which must be properly nested. The body of an annotation cannot, however, currently use mark-up (because it is represented as an ALT of an IMG). Annotation can be nested within one another (though sometimes this gives problems?); in this case the highlighting (which is the colour red) will only currently show the outermost annotation.

Installing and running LAP

At Exeter you can simply use the scripts and binaries under /home/pjbrown/annotation_experiments/guide_layers. This directory contains the lap script. At this early stage we just run at Exeter, but away from Exeter you would need to get all the LAP scripts, the web-accessible image(s) used to mark annotations, and both Unix Guide and ML/I (a macro processor used for certain conversion tasks).

Before you start you need to create a directory called tmp_saves under your home directory to contain the HTML versions of the underlying web pages that have been annotated.

To start a session:

start Netscape in the background (ideally you should have just one Netscape running, but things seem to work quite well if you have two, e.g. Netscape Navigator and Netscape Composer).
type:
lap FILENAME

The FILENAME is normally a .ann file containing some annotations. (If you omit the filename, LAP will use a default example file.) The LAP interface is an instantiation of Guide. Hopefully it should be fairly straightforward, but a basic familiarity with Guide may help. Guide is 15 years old, and the interface falls well short of today's standards.

To annotate a web page, select From-browser from the LAP menu. This will ask you to type a URL. Often the page you want to annotate will already be displayed by Netscape; if so, all you have to do is to cut-and-paste the Location from Netscape into LAP (it would be good if Netscape allowed LAP to say "give me the HTML of the currently displayed page", but unfortunately Netscape does not currently support this).

Todo: rules for making the annotation.

To send an annotated page back to the browser, you first make LAP display the annotated file. (It will be there already if you have just created the annotation, but otherwise you need to load the .ann file into LAP -- either by using the New command or by starting a new LAP session). You then select To-browser from the LAP menu, and the file should soon appear in Netscape. The mechanism behind this is that LAP creates a temporary HTML file (the consolidate file); it then sends this consolidated file to Netscape. (LAP adds a BASE to the header of this file to reflect the original location of the file; this is necessary if the HTML contains relative addresses -- it usually works well but there are occasional glitches.)

To send a .ann file direct to Netscape, without using LAP, use the command:

html_save -m FILENAME

Finally, please note that LAP is a research prototype; "fancy" web pages that are typically found at commercial sites often upset it (mainly because such pages are not designed to be saved at a different site and then used from there). When used to annotate research papers it is much more reliable.

Using LAP remotely

Currently there is a limited facility to allow a user anywhere on the web to give the name of a .ann file (which must currently be at Exeter) and view the corresponding annotated web page. There is a link to this facility from my annotation home page.

PART 3: lifelong usage

Todo: repository of all annotations, data types, use by researchers.

References

Phelps, T.A. and Wilensky, R. `Robust hyperlinks: cheap, everywhere, now', Proceedings DDEP00, Munich, Germany, 2000.
Fetterly, D., Manasse, M., Najork, M. and Wiener, J.L. (2004) `A large-scale study of the evolution of Web Pages', Software & Practice and Experience, 34, 2, pp. 213-237.