Research issues in context-aware retrieval: privacy

Peter Brown and Gareth Jones
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
P.J.Brown@ex.ac.uk, G.J.F.Jones@ex.ac.uk

Introduction

Note this draft has now been superseeded by a magazine article see magazine article.

The more a context-aware application knows about a user, the more it can tailor its behaviour to meet the user's exact needs, and thus the better it can serve the user. On the other side of the same coin, the more an application knows about a user, the bigger the danger to the user's privacy.

Unfortunately, many past discussions of these privacy issues seem to have brought out the worst in the participants: two parties with extreme positions shout at each other. In this paper, we try to look at the technical issues. We concentrate on privacy for human beings, though similar issues -- under the heading "security" rather than "privacy" -- might apply to tagged physical objects, e.g. would management want everyone to have access to the current location of a piece of expensive, yet portable, equipment?

The paper has been very much influenced by the cited papaers from Ackerman et al[1], Busboom [3] and Langheinrich [5]. For a discussion from a more sociological viewpoint see Harper[4].

Privacy principles

A widely-accepted set of four privacy principles that all applications should follow is as follows:

Notice: telling the user in advance what data is to be collected, and naming those who will have access to it.
Choice: at the very least, allowing the user to opt out. Ideally, if he opts out, the user should be able to delete all previous data stored about him.
Personal access: allowing the user to access all the information stored about them, to delete items that they do not like, and to correct errors. A possible extension of the last of these is to allow users to deliberately introduce errors.
Security: protecting the data from access from third parties other than the named ones. (A get-around that greatly reduces security is the commonly-seen notice that the data may be divulged to XXX and to any other companies who may be added from time to time.) Sometimes the third parties allowed to access a user's data may be those who fill certain roles, e.g. all the managers in an organization. Overall the user may have no control over which individuals can look at their data.

Characteristics of context-aware applications

Some characterists of context-aware applications that may compromise privacy are as follows:

use of a rich personal context about the user, e.g. their user's movements, physical well-being, moods, etc., the documents that they are reading and writing. The values of contextual fields may be changing continuously, and monitoring them, even over a short time, may reveal a lot of information about the user's habits, e.g. their movements, their reading patterns.
setting of the value of a contextual field by an application that is not controlled by the user; for example a user's location field may be set using cameras distributed round a building.
keeping a history of past values of these contextual fields, and using this past behaviour as a guide to the future.
looking at scheduled future events, as might be found, for example, in the user's diary. For instance it might be useful to an application to know that the user was scheduled to be at a meeting in location X in two hour's time with colleagues Y and Z.
trying to give almost immediate response to a change in context. Coupled with the previous two points this means that a context-aware application may have personal data relating to the past, the present and the future.
using multiple sources for contextual values; thus for location active badges may normally be used, but if a user's active badge is not working, a alternative source, such as movements past electronic doorlocks, may be automatically used instead. Alternatively, at least over short periods, a value may be extrapolated from historical data (`Over the past hour he has be travelling SW at 50 miles per hour; since the sensor is not working we will assume this pattern continues').
explaining to the user what contextual fields were used to cause a certain piece of behaviour (`this information [about a certain cafe] was retrieved because the cafe serves the food you usually choose, and because your friend Bill often goes there'). Such information is almost vital if the user is to have the chance of correcting errors in the setting of their context, e.g. to request the change `Do not use Bill as a peer; indeed use him as an anti-peer, i.e. I want my behaviour to be the opposite of his!'.
possibly being implemented as a distributed set of software components, with no central control.
possibly requiring a threshold level of acceptance to be viable; for example if an application kept a record of current locations of employees and also their personal diaries, and used this information to arrange ad hoc meetings of groups, then to be viable, the application would need to cover at least 90% (say) of the employees. (This may lead to pressure on those that opt out.)

There are further two issues that create problems when protecting privacy:

synthesis, from values of low-level sensors, of more high-level events. An example is that an application might deduce, from the movements of people in a building that a meeting took place. The synthesized values may also be stored as history. Often the synthesis involves a degree of guesswork; thus synthesized values are even more likely to be wrong than values obtained directly from sensors.
the use of other people as peers for the current user. Assuming the user wants to follow their peers, this might mean, for example, that the user is presented with the same documents that the peer has recently read. There may be a single peer or several peers, and the identity of the peers may or may not be unknown to the user. Moreover a person may be used as a peer without their knowing (as in "users who have bought book X have also often bought book Y"). Thus historical elements of the context of one user are used to determine what is delivered to another user.

We call these two derived context or behaviour. Their importance is that the application makes deductions from contextual data. An end-user may be able to perform the reverse process: they see the deduction, and they may know some parts of the contextual data from which the deduction was made; from this they may be able to deduce other contextual data, that was intended to be private. Thus a user may be presented with a pornographic paper; they may deduce that the paper was presented because one of their peers read it. They may be able to deduce which peer it was. The peer's privacy is compromised; moreover they may not even know they were being used as a peer.

Multiple contextual sources

A user may be able to access contextual information through multiple sources, e.g.

contextual information provided by a certain application.
contextual information provided by another application.
contextual information provided by their own eyes, e.g. people they see and where they are going.
contextual information from public sources such as signing-in books.

By combining these, a snooper can sometimes break the personal privacy provided by an application, e.g. to reveal who an "anonymous" person is.

Control over applications

Many components of context-aware applications are owned and controlled by third parties, not the user. Examples are:

a component that tracks active badge sightings throughout a building.
a mobile phone service that uses the user's current location to tell him of nearby hotels.
a security camera. This is not really a context-aware application per se, but a context capturing application. It is even more outside the user's control than the other cases: the user does not choose to run the application; it is just there in the environment. Several components of pervasive computing can have similar properties: the whole aim of the pervasive technology is to make the computer invisible.

These applications may or may not provide controls to allow a user control over how their context is used.

We now discuss more details of context-aware applications. It is convenient to divide such applications into two classes, depending on how much context is used.

Case 1: single-user-context applications

The easiest case for protecting personal privacy is in single-user-context applications. Such applications may have multiple users but each user is independent of the others (except, of course, that performance may be worse if there are lots of simultaneous users). Most mobile phone services, even if location-aware, come into this category. In single-user-context applications a user supplies an application with their personal context, and the application, perhaps using other non-personal contextual information, adjusts its behaviour to meet the user's needs. This case is not fundamentally different from any other case of an application dealing with personal information. Protection can be achieved though file access permissions, through encryption, through authentification and through the use of secure data transmission protocols.

In addition the user needs to trust the application: e.g. in spite of its apparent privacy controls, does the application secretely transmit information to a third party, or, more likely, in the case of software error might it dump a lot of personal information in a non-secure storage medium? Overall a sine qua non for privacy protection is a set of reliable software and hardware components.

Case 2: multi-user-context applications

A multi-user-context application uses context from other (human) users to influence how it behaves to one user. The privacy problems arise when the context of one user is visible to another, and/or influences the way information is presented to another. An example is an application that tells people their colleagues' whereabouts, or the most popular document read by their colleagues today. In the first example data about your location is accessible to your colleagues, and one of them might even use this to keep a historical record of all your movements (though some applications discourage this by refusing to answer continually-repeated questions). Moreover colleagues may innocently store information about you, in an application log for example, and this storage may not be secure; less innocently colleagues may deliberately send your personal information to untrusted third parties.

Applications do not always allow each user full access to every other user's context. For example:

they may only allow certain questions, such as `who is my nearest colleague', or `where is the nearest support engineer'.
they may anonymize the data: you can ask how many colleagues have visited a certain cafe, but not who they are. Alternatively people may choose to have nicknames, known only to themselves. Thus you may be told that `Bilbo' has visited a certain cafe, but you would not know who Bilbo is. This letter technique is, however, dangerous to privacy. If you have just seen person X in a cafe, and are then told that Bilbo was recently visited it, you can probably guess X is Bilbo -- an example of the use of alternative sources of context to compromise the privacy offered by one source.
the application may, each time any part of the user's context is about to be revealed to a third party, ask the user to sanction this. The user can keep a log of such requests, and can periodically analyse the log for suspicious behaviour, e.g. an electronic stalker. Unfortunately in any real application this often leads to such a flood of requests that the facility is impracticable. (The same applies to another - comparatively minor -- intrusion of privacy: the setting of a cookie. Many of us initially demand that each web page ask our explicit permission before setting a cookie. However we soon get fed up with all the dialogue involved, and revert to giving web pages a free rein to set cookies whenever they wish.) Nevertheless a log of requests is still valuable, even if the answer to each request is an automatic "yes".
following on from the last case, the user may designate a program that will act as his agent to sanction requests for personal information from outsiders. Typically this will be a general program for which each user can set their own parameters, e.g. one parameter might be the set of people that can access the user's data. Ideally this agent program should be called whenever current, past or future data is accessed; moreover it should be called for derived data or cached data as well as original data. Thus in the case of derived data, if the data is that a meeting took place between X, Y and Z, then this can only be revealed to a third party if each of X, Y and Z agrees. When the agent program is called, its arguments would be (a) the enquirer and (b) the age and (c) nature of the data to be accessed (e.g. `P wants to look at your location two months ago'). Clearly the user should be able to change the agent program at any time, for example to always say no, or to forbid access to data more than a week old. Normally this will be done by resetting program parameters rather than by re-programming. The program might be realised partly in hardware (`the privacy chip'), which could give users more confidence.

The issue of trust of the application applies even more strongly to multi-user-context applications. Such applications may use complex algorithms, and such algorithms can easily contain bugs that reveal information that a user asked to be private.

Preventative measures

One of the above privacy principles was that any application should allow a user to opt out at any time. Ideally this will involve denying others access to their past, present and future contextual data (see discussing of agent programs in the previous Section), or perhaps deleting such data altogether. To give confidence in this process, it is probably best if the application supplies all the contextual data for person X in a discrete set of files that person X has control of, and can thus delete or change access permissions (see, e.g., the system of Smailagic et al [6] for protecting location information, or the much more elaborate RBAC model for setting inter-object security constraints [2]). This may be done directly by the user or via agent programs. The user may also wish to encrypt some or all of the data, and control the way keys are issued to third parties.

For all this to work, the user would need to be satisfied that this was the only copy of the information, and that the information was not also stored in other places, such as caches and logs. This can be desperately hard to achieve in practice. The same applies to derived contextual data.

An alternative approach is for a user to switch off their personal contextual sensors -- in cases where they can: you cannot switch off a security camera. Switching off has great advantages if the user does not trust the application, but there are two dangers:

the application may still have a historical record of past values from this sensor, and this data may still be accessible to others -- something the user may not want.
the application may automatically use alternative sensors to get the same information.

A third approach is to allow user's to change (or forge) their own past or present contextual data -- a strong version of the third privacy principle. This can be used to correct sensor errors or deductions from sensor values (derived context), but more importantly provides a solace to users worried about Big Brother. Thus if Big Brother is a manager who is looking at how much time X spends in the office, then X can forge his data to show he worked each evening. If an application uses this historical data, the forgery may, however, have unfortunate side-effects: if there is a really big attraction one evening, the application may not bother to inform the user of it, since he always works in the evening! There is also the possibility that Big Brother may have been at the office one evening, and knew that X was not there -- a further example of the use of multiple contextual sources in compromising privacy.

A fourth approach, who at least gives the user a chance of recourse if things go wrong, is to embed traceability into data. For example an image can have an electronic watermark that shows its source; if the image is then found in the hands of an unauthorized user, then it can hopefully be proved unequivocally where it came from.

Infrastructure

Users need to be able to express their privacy requests in an application-independent way, instead of using a different notation for each application. An important initiative to address this is W3C's Platform for Privacy Preferences (P3P) project http://www.w3.org/P3P. It is aimed to provide a simple automated way for users to gain control of personal information on web sites they visit. In the future this may become a model for privacy policy negotiation for all context-aware applications.

Understanding and correcting synthesized values

Assume that a user wishes to correct a synthesized value. There are two possible problems:

rather than correct the value the user might wish to correct the process by which the data was synthesized, so that the same error does not continually occur. For example the data might say that the user was in a meeting all day, whereas in reality he was not in a meeting at all. In order to set about correcting the process, the user must understand it -- not always an easy task. For example the synthesis about the meeting may have come from active badge data, and the error might have occurred because people passing the user's office were wrongly recorded as being inside, or even because there was a number of unused -- but operating -- active badges in a corner of the user's office. To find such errors the user would need to know how the application derived its synthesized data, and what other people were apparently involved.
synthesize data might concern two people, e.g. X has met Y many times. X might want to correct this to "X has never met Y" whereas Y might want to leave it as it is.

Some guiding principles for multi-user-context applications

It is valuable if multi-user-context applications embody some guiding principles on privacy, corresponding to generally accepted ethics. These should supplement the four principles quoted earlier. Some possibilities are:

Reciprocity: user A can only look at user B's contextual data if user B can look at user A's. A weaker version of this is that user A must at least reveal their identity to B before looking at B's data; this identity should point to one person rather than a generic identity such as `administration' or `security division'. Even in this weak case, user A could argue that it is a breach of his privacy if his request is revealed to B! This may be specious, but users should at least be warned in advance that each request for information will be relayed to the person about whom they are seeking information, i.e. if you are trying to act as a spy the quarry will know it.
Physical presence: if an application involves retrieving information about past events, the user can only retrieve information about events they were present at. More generally, users can only retrieve historical information they would already know, assuming they had a perfect memory. As it stands this rule is too strict for most applications, and there needs to be a distinction between what is generally held to be public and what is not. Thus person Y should always be able to retrieve the author's slides for a past public seminar, but they should not be able to ask whether person X attended the seminar unless they themselves were there. (There is a slight blurring here: what if there were 2,000 people at the seminar? In reality person Y could have attended and not seen person X even if person X was present; thus the retrieval system is giving extra information that the user did not already know.)

Are the issues new ones?

Most of the privacy issues with context-aware applications are not new ones, but bring new worries:

context-aware applications are novel and not well understood; this may generate fear and uncertainty, and make them an easy target for scaremongers.
the scale of the collection of personal information goes well beyond most other applications in quantity, quality, completeness, persistency and potential value. Moreover the user may have less awareness of where and how data is being collected.

As an example of an existing application, credit card companies have considerable data about the location and habits of their user's, particularly habitual users. In the past this has sometimes been abused (there were once stories in the British press, for example, about wine buying habits of a cabinet minister), but it is not a matter of great public concern. Similarly mobile phone companies have records of their user's locations (albeit only to within a hundred metres or so) and probably the content of their calls, too. Again this is not a current issue of great public concern.

Finally electronic doorlocks and cameras in offices provide a lot of information about users, as do outside cameras on roads or in public places. There is some opposition to these, but the balance of opinion appears to be strongly in favour of them, because of their success in crime detection and prevention.

Public perceptions

Public perception of what is a breach of personal privacy varies between countries; furthermore in any one country it changes over time. In western countries the trend has been for more and more concern over personal privacy, though after September 11 the pendulum has swung the other way. Overall therefore we are not dealing with absolutes, but more with varying public attitudes.

Even if an individual is not concerned with privacy, and is quite happy for his fellow citizens to know all about him, there is still a potential pitfall. Spam e-mailers use all the information they can glean from the internet, and the more they know about an individual the more they can send them targetted e-mail. This can be a deterrent to openness.

If data is stored on the computer, its use in most countries is controlled by the law. Typically the law involves the set of four principles described earlier. Of course a law or a code-of-conduct is useless unless there is some way of enforcing it. Data protection is especially hard to enforce in distributed multi-national applications. Moreover most of such laws were formulated without context-aware applications in mind; indeed when an official from a data protection authority recently visited a research laboratory concerned with context-aware applications, he was dumbfounded by the privacy implications of the technology, and the potential difficulties of legislation.

Privacy-threatening applications without software

To reinforce the point that privacy issues with context-aware applications are not new, consider what humans can do with their eyes. I can quite legally keep records, for every person I meet, of the time and place and the name of the person; indeed some people do this in their diaries, albeit in a less than complete way. I may possibly run into the law if I store these records on the computer. Moreover I may augment my eyes by wearing a continuously running video-camera, and could use this to record my life, and in particular my interactions with other people. Here the data would best be analysed using software, e.g. face-recognition software to identify people (from a known group) from the camera images. (This is an instance of synthesizing higher order data from low order data.) Although all this is far away from what people think of as context-aware applications, you could argue that it potentially impinges on the personal privacy of others just as much.

Conclusions

The privacy problems with context-aware applications arise mainly with multiple-user-context applications. This is an especially difficult case because the whole purpose of such an application is to share person information between users, i.e. it is by nature a counter-privacy application. Because of all the difficulties, we believe that users who release their personal data to such applications must treat such data as public. Although the application may be confined to a trusted community, each user can intentionally or indirectly (e.g. via an automatic log) save other people's personal data that they have accessed, and the chances of this trusted community each having perfect security to prevent information being passed on are small. Moreover derived context and derived behaviour may further compromise any privacy controls that have been set.

Users can and should be able to opt out of such applications whenever they wish. It is, however, unrealistic to expect retrospective opting out (i.e. requiring all past data to be destroyed) to work in multiple-user-context applications. Again the problem comes because there may be multiple copies of this past data, held by various users, and there may be data derived from it.

We have outlined various methods for protecting privacy. These may not be foolproof, but they at least prevent it from being too easy for an outsider to compromise privacy. We still believe that overall the negative conclusions of the previous two paragraphs hold. Thus an application should warn its users on what they are getting into, and applications will only attract users with an open-minded approach to privacy. It may even be best for an application designer to accept from the start that the application is only suitable for a certain class of user (e.g. for those people Alan Westin classifies as "privacy pragmatics" or "privacy unawares", but not for those people classified as "privacy fundamentalists" -- apparently 11% of the population); hence elaborate -- and often ultimately fruitless -- attempts to protect privacy may not be worth pursuing. Following on from this, application designers should think twice about producing applications that depend, for their viability, on, say, 90% acceptance level among the community.

Any context-aware application will, however, have some privacy protection. In implementation terms, protection cannot be added as an afterthought: instead it needs to be incorporated from the start into the design of the application.

References

Ackerman, A., Darrel, T., & Weitzner, D.J. `Privacy in context', HCI, 16, 2, pp. 167-179, 2001.
Barkley, J., Beznosov, K. and Uppal, J. `Supporting relationships in access control using role based access control', Fourth ACM Workshop on Role-Based Access Control, Fairfax, Va., pp. 55-65, 1999.
Busboom, A. `Delivery context and privacy', Position paper for W3C Workshop on Delivery Context, Sophia-Antipolis, 2002.
Harper, R.J. `Why do and don't people wear active badges': a case study', CSCW 4, 4, pp. 263-280, 1995.
Langheinrich, M. `Privacy by design: principles of privacy-aware ubiquitous system', Tutorial notes from Ubicomp 2001, Atlanta, 2001; `Personal privacy in pervasive computing', Tutorial notes from Pervasive 2002, Zurich, 2002.
Smailagic, A., Siewiorek, D.P., Anhalt, J., Kogan, D. and Yang Wang, `Location sensing in a context aware computing environment', Pervasive Computing, 2001.
W3C, Platform for privacy prefences (P3P) project, http://www.w3.org/P3P, first released 1998.