Engineering of hypermedia links: some lessons from engineering of software

Heather Brown, Peter Brown, Les Carr, David De Roare, Wendy Hall, Luc Moreau

Department of Computer Science, University of Exeter, Exeter EX4 4PT, UK
and
Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK

{H.Brown,P.J.Brown}@exeter.ac.uk
and
{L.A.Carr,D.C.DeRoure,W.Hall,L.A.V.Moreau}@ecs.soton.ac.uk

ABSTRACT

Todo

[[TODO All I have done is to make a start: essentially I have expanded on one issue we discussed at our brief brain-storm: linking versus using a search engine. The rest is merely an outline. What is needed is Southampton collective wisdom on software engineering techniques applied to linking, plus some material derived from your experience of large hypermedia development projects, both those with changing content and those with fixed content, such as Mountbatten. If we could find an outside author from a company who develops web presentations, that might help. Peter + Heather ]]

Introduction

When people first tried to build software -- in the period 1950-1970, say -- the emphasis was on the initial creation of that software, and this creation was done with an accent on flair. Exactly the same is true of the building of hypermedia materials today. With software it soon became apparent that the major cost was in continual maintenance, not in the original creation. Indeed the initial creation is often estimated at about 20% of the overall lifetime cost of software. As a result of these realisations the discipline of software engineering was born, around 1968, with the aim of solving the then `software crisis' of unmanageable software projects. Of course, software crises have not been eliminated by the discipline of software engineering, but nevertheless enough strides have been made to make possible the huge amounts of successful software that most of us depend on throughout our daily lives -- often without realising that the software is there. It is widely accepted that the same issues that software engineering has tackled over the past thirty years apply to hypermedia today. In particular they apply to links within hypermedia: every user of the web is familiar with the `dangling link', the link to a non-existent destination. Indeed a study (W3C, 1998) found that between 5% and 8% of all links on the web are dangling. This normally reflects a failure of management and maintenance.

In this paper we discuss link engineering: how an engineering discipline can be applied to the creation of links within hypermedia. Obviously the engineering of links is only part of the overall task of engineering a hyperdocument. We believe, however, that it is a crucial part. Other aspects of hypermedia engineering, in addition to links, are discussed by Lowe and Hall [ref]. At a lower level, the fundamental nature nature of links, and their relationship to program constructs is analysed in our previous paper [ref].

Of course hyperdocuments increasingly involve pieces of program, both at the server end, such as cgi-scripts in WWW, and at the client end, such as applets. Software engineering applies to these programs, just as any other. Thus people building hyperdocuments need to have knowledge of both software engineering and link engineering, and it helps if the two disciplines use similar methodologies.

Control

A problem for any project is its interfaces with outside objects over which management has little or no control. In a software engineering project this lack of control relates generally to the infrastructure: the operating system and the software tools, especially compilers, libraries, object platforms, etc. On the other hand where there is scope for management control, such as in interfaces between different software modules within a project, the advances in software engineering have generally provided the tools needed for successful project management, even in projects that involve many different contractors.

In a link engineering project, lack of control over the infrastructure can still be a problem, e.g. "improved" versions of browsers, but there is also a much more fundamental lack of control: links can, and often do, lead to outside documents over which there is no control, and such outside uncontrolled documents can themselves link to the current document. We call such links external links and we believe that this fundamental lack of control of external links will be a problem for the foreseeable future, though there have been various proposed solutions involving universal notifications of changes or (a partial solution) universal naming of documents and other resources. This lack of control often makes approaches to link engineering different from corresponding approaches to software engineering.

Change: the enemy

Maintenance is the biggest cost in a software project, and, we believe, in a hypermedia one too. Lowe and Hall propose three core categories of maintenance: corrective, adaptive and perfective. In some projects maintenance is ruinously expensive because the initial creation and the initial testing were poorly done: "the whole thing never really worked, and after a while correcting one error seemed to generate two more". If we eliminate these projects and only consider properly managed projects, correction of errors will remain as a maintenance issue, but the main reason why maintenance is expensive is change: the infrastructure changes, some real world artifact changes (e.g. the tax law changes and this affects both software that implements the tax law, e.g. a payroll program, and hyperdocuments that explain the tax law), the user interface needs to be improved, the whole project is augmented, the hierarchical organisation that is presented to the user needs to be replaced/augmented, ... . The above list applies to both software and to hyperdocuments, but an aspect of change that applies particularly to links is that the destination of a link may change. Assuming this destination is a document, change can be at either or both of two levels: the document name may change and/or the document content may change. (In WWW terms, what we mean by `name' is a URL.)

Thus the essential aims of link engineering, as they are for software engineering, are to make change easier and errors less likely. Change and errors are of course closely related: introducing change throws up new errors, especially if the person doing the changes does not totally understand the artifact they are changing.

Classification of links

One can try to classify links in many ways. One way is to look at the author's motivation for the link. Research by Kim [ref] -- which was based on scholarly publications -- suggests that there is no simple motivation for links. Kim classified links into 19 possible types, but even with this large number of types Kim found that 72.8% of links had multiple motivations, i.e. combined more than one type. Kim's conclusion is that `hyperlinking is a multidimensional behaviour, using different levels of motivation'. Our experience supports this conclusion, and we believe it carries over to all types of hypermedia. [[Todo: do we agree?]] Instead we believe it is better to classify links according to the mechanism that effects them; this approach is taken by DeRose [ref] and we discuss it in detail below.

Elimination of dangerous constructs

One approach in software engineering has been to eliminate building practices that are expensive, either because they are dangerous in their propensity for generating errors, or because, even if simple for the author, they lead to great expense in maintenance. Thus in programming languages constructs found to be error-prone such as gotos and global variables have either been:

completely eliminated and replaced by safer constructs.
supplemented by safer alternative constructs that cover a majority of their uses.

As an example of the two approaches, global variables can either be (1) eliminated, as in functional programming, or (2) supplemented by constructs such as local variables or variables with limited visibility.

If we carry this principle over to hypermedia links, it would be interesting and radical to try approach (1), though some would say that "hypermedia without links" is a contradiction in terms. It is more realistic, however, to try approach (2) and in particular to try to eliminate the most dangerous and costly types of links. To see how this might be done, it is useful to look at the taxonomy of links created by DeRose. At the highest level, DeRose identifies two types of link: extensional and intensional. Extensional links are the ones that we are most familiar with in hypermedia: links between one place and another explicitly created by the author. (Here the author of the link may or may not be the same as the author of the underlying document. In particular in those hypermedia systems that allow the user to create their own links, the user can become a link author.) Extensional links have known, pre-determined, ends.

Intensional links, on the other hand, involve a function, which, given the source of a link, calculates the destination(s). There are various types of intensional links, one parameter being the nature of the function and the amount of tailoring of it done by the author. The most extreme form of intensional link, in terms of low cost of authorship, is simply to use a search engine as the function. One user interface to provide this would allow the user to select any part of the hyperdocument (a word, several words, a picture, ...) and ask for links to be supplied; the search engine will then find some possible links, and will present, say, the best five to the user. In less extreme forms of intensional links the author has more control over the function: if the function is a searching function the author might provide the database to be searched -- it might be a glossary created by the author -- or the author might say that links only applied to pre-defined words in the hyperdocument -- for example those words that relate to a glossary. An example of a more author-controlled intensional link is the generic link found in Microcosm [ref].

In general terms, intensional links are cheap and extensional links are expensive. This applies both to the creation of links and, more importantly, to their maintenance. Thus one possible strategy for hyperlink engineering is to try to find types of intensional link that largely remove the need for extensional links. In terms of what the user sees the arguments for and against this can be summarised as follows:

Against:

links in a hyperdocument represent a prime intellectual contribution of the authors, and the quality of links is a way of separating good authors from bad. A mechanical function that calculates links will never come close to a good author.
although extensional links may be expensive in authorship effort, they are cheap and fast at run-time; intensional links that involve, e.g., invoking a search engine, may be slow, and may be impractical on small mobile devices.

For:

hyperdocuments are normally used by a large number of different readers (indeed this will be necessary to justify the cost of the hyperdocument). Readers are likely to have diverse requirements and diverse previous knowledge, and no set of pre-defined extensional links will cater for all of them; this reduces the value of extensional links.
intensional links cater much better for a changing world. It is usually possible for the function used by an intensional link to be calculated on-the-fly, when the user selects the link. Ideally the function can work both on the current (or at least recent) names of possible destination documents and their current content. Thus if, say, the content of a destination document has changed, then the result of the function's calculations may be a new destination point. If, on the other hand, the destination of an extensional link changes, then this requires a human to realise the problem and to make the necessary change: in practice this rarely happens, especially with external links.

We believe that the balance between advantages and disadvantages is such that a future research topic for linking engineering must be in improving intensional links so that they can gradually supplant extensional ones.

A second possible strategy for eliminating some extensional links is to generate them mechanically using a controlled and easily managed process, rather than to require the author to create them individually. This can certainly be applied to navigational links: the author should be able to specify the home page and the orderings that apply to other pages, and then a mechanical process can generate, for each page, navigation links such as Home, Next and Previous -- indeed there already exist many tools that can do this. What these authoring tools do is to raise the author's level of abstraction from that of low-level navigational links to a higher-level document structure. It would be valuable if a similar approach could be applied to other types of link, in addition to navigational links.

Taming complexity

The reason why change is difficult to implement, both in software projects and hypermedia projects, is complexity in the underlying system, and the limited capacity of the human mind to master such complexity. Thus many -- indeed most -- techniques used in software engineering have the underlying aim of taming complexity. Booch [ref] identifies four major components to taming complexity:

abstraction.
encapsulation. If each design decision is encapsulated within an area of limited and well-defined scope, then there is a guarantee that changing the design decision will have no effect outside that scope.
modularity.
hierarchy. Hierarchies appear to be the best structures for the human mind to master complex systems.

We believe that all of these are relevant to hypermedia authorship tools and environments. We have already mentioned the need to have abstraction mechanisms for navigational links, and, since a `strong hierarchical backbone' is a facet of most good hypermedia documents, there is an obvious need for authoring mechanisms that capture hierarchy. The needs for encapsulation and modularity are also clear. We believe that there is a pressing need in authorship languages to cover all four of Booch's components.

Further topics: authoring tools, etc.

Testing and validation

[[learning from experience in SE of testing methods and of formal approaches]]

References

[[Todo: I have left most of the references from the previous paper here for the time being, in case we need them.]]

Booch, G. Object-oriented analysis and design, Addison-Wesley, Menlo Park, Ca., second edition, 1994.
Brown, H., Brown, P.J., Carr, L., Hall, W., Milne, W. and Moreau, L. `A link-oriented comparison of hyperdocuments and programs', DDEP'00 Proceedings, Munich, Springer, 2000.
Brown, P.J. `Do we need maps to navigate round hypertext systems?', EP-odd, 2, 2, pp. 91-100, 1989.
Davis, H.C., Hall, W., Heath, I., Hill, G.J. and Wilkins, R.J. `Towards an integrated environment with open hypermedia systems', Proceedings of the ACM Conference on Hypertext: ECHT92, ACM Press, pp. 181-190, 1992.
De Young, L. `Linking considered harmful', Hypertext: Concepts, Systems and Applications, Proceedings of the European Conference on Hypertext, INRIA, France, Cambridge University Press, pp. 238-249, 1990.
DeRose, S.J. `Expanding the notion of links', Hypertext'89 Proceedings, ACM Press, pp. 249-257, 1989.
Hak Joon Jim `Motivations for hyperlinking in scholarly electronic articles: a qualitative study', Journal of the American Society for Information Science, Volume 51, Number 10, pp. 887-899, 2000.
Kappe, F., Maurer, H., and Sherbakov, N. `Hyper-G: a universal hypermedia system', Journal of Educational Multimedia and Hypermedia, 2, 1, pp. 39-66, 1993.
Landow, G.P., `The rhetoric of hypertext: some rules for authors', Journal of Computing in Higher Education, 1, 1, pp. 39-64, 1989.
Lowe, D. and Hall, W. Hypermedia & the web: an engineering approach, John Wiley, Chichester, 1999.
Moreau, L. and Hall, W. `On the expressiveness of links in hypertext systems', Computer Journal, 41, 7, pp. 459-473, 1998.
Newcomb, S.R., Kipp, N.A. and Newcomb, V.T., `The HyTime hypermedia/time-based document structuring language', Comm. ACM, 34(11), pp. 67-83, 1991.
Nielsen, J. Multimedia and hypermedia: the internet and beyond, Academic Press, San Diego, Ca., 1995.
Trigg, R.H. A network-based approach to text handling for the online scientific community, PhD thesis, Dept of Computer Science, Univ. of Maryland, (University Microfilms #84290934), 1983.
W3C, `Web characterization activity answers to the W3C HTTP-NGs protocol design group's questions', http://www.w3c.org/WCA/Reports/1998-01-PDG-answers.htm, 1998.
Yankelovich, N., Meyrowitz, N. and van Dam, A., `Reading and writing the electronic book', IEEE Computer, 18, 10, pp. 15-30, 1985.
Zellweger, P.T. `Active paths through multi-media documents', in van Vliet (Ed.), Document manipulation and typography, Cambridge University Press, pp. 1-18, 1988.