Research issues in context-aware retrieval: privacy

Peter Brown and Gareth Jones
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
P.J.Brown@ex.ac.uk, G.J.F.Jones@ex.ac.uk

Introduction

Note this draft has now been superseeded by a magazine article see magazine article.

The more a context-aware application knows about a user, the more it can tailor its behaviour to meet the user's exact needs, and thus the better it can serve the user. On the other side of the same coin, the more an application knows about a user, the bigger the danger to the user's privacy.

Unfortunately, many past discussions of these privacy issues seem to have brought out the worst in the participants: two parties with extreme positions shout at each other. In this paper, we try to look at the technical issues. We concentrate on privacy for human beings, though similar issues -- under the heading "security" rather than "privacy" -- might apply to tagged physical objects, e.g. would management want everyone to have access to the current location of a piece of expensive, yet portable, equipment?

The paper has been very much influenced by the cited papaers from Ackerman et al[1], Busboom [3] and Langheinrich [5]. For a discussion from a more sociological viewpoint see Harper[4].

Privacy principles

A widely-accepted set of four privacy principles that all applications should follow is as follows:

  1. Notice: telling the user in advance what data is to be collected, and naming those who will have access to it.
  2. Choice: at the very least, allowing the user to opt out. Ideally, if he opts out, the user should be able to delete all previous data stored about him.
  3. Personal access: allowing the user to access all the information stored about them, to delete items that they do not like, and to correct errors. A possible extension of the last of these is to allow users to deliberately introduce errors.
  4. Security: protecting the data from access from third parties other than the named ones. (A get-around that greatly reduces security is the commonly-seen notice that the data may be divulged to XXX and to any other companies who may be added from time to time.) Sometimes the third parties allowed to access a user's data may be those who fill certain roles, e.g. all the managers in an organization. Overall the user may have no control over which individuals can look at their data.

Characteristics of context-aware applications

Some characterists of context-aware applications that may compromise privacy are as follows:

There are further two issues that create problems when protecting privacy:

We call these two derived context or behaviour. Their importance is that the application makes deductions from contextual data. An end-user may be able to perform the reverse process: they see the deduction, and they may know some parts of the contextual data from which the deduction was made; from this they may be able to deduce other contextual data, that was intended to be private. Thus a user may be presented with a pornographic paper; they may deduce that the paper was presented because one of their peers read it. They may be able to deduce which peer it was. The peer's privacy is compromised; moreover they may not even know they were being used as a peer.

Multiple contextual sources

A user may be able to access contextual information through multiple sources, e.g.

By combining these, a snooper can sometimes break the personal privacy provided by an application, e.g. to reveal who an "anonymous" person is.

Control over applications

Many components of context-aware applications are owned and controlled by third parties, not the user. Examples are:

These applications may or may not provide controls to allow a user control over how their context is used.

We now discuss more details of context-aware applications. It is convenient to divide such applications into two classes, depending on how much context is used.

Case 1: single-user-context applications

The easiest case for protecting personal privacy is in single-user-context applications. Such applications may have multiple users but each user is independent of the others (except, of course, that performance may be worse if there are lots of simultaneous users). Most mobile phone services, even if location-aware, come into this category. In single-user-context applications a user supplies an application with their personal context, and the application, perhaps using other non-personal contextual information, adjusts its behaviour to meet the user's needs. This case is not fundamentally different from any other case of an application dealing with personal information. Protection can be achieved though file access permissions, through encryption, through authentification and through the use of secure data transmission protocols.

In addition the user needs to trust the application: e.g. in spite of its apparent privacy controls, does the application secretely transmit information to a third party, or, more likely, in the case of software error might it dump a lot of personal information in a non-secure storage medium? Overall a sine qua non for privacy protection is a set of reliable software and hardware components.

Case 2: multi-user-context applications

A multi-user-context application uses context from other (human) users to influence how it behaves to one user. The privacy problems arise when the context of one user is visible to another, and/or influences the way information is presented to another. An example is an application that tells people their colleagues' whereabouts, or the most popular document read by their colleagues today. In the first example data about your location is accessible to your colleagues, and one of them might even use this to keep a historical record of all your movements (though some applications discourage this by refusing to answer continually-repeated questions). Moreover colleagues may innocently store information about you, in an application log for example, and this storage may not be secure; less innocently colleagues may deliberately send your personal information to untrusted third parties.

Applications do not always allow each user full access to every other user's context. For example:

The issue of trust of the application applies even more strongly to multi-user-context applications. Such applications may use complex algorithms, and such algorithms can easily contain bugs that reveal information that a user asked to be private.

Preventative measures

One of the above privacy principles was that any application should allow a user to opt out at any time. Ideally this will involve denying others access to their past, present and future contextual data (see discussing of agent programs in the previous Section), or perhaps deleting such data altogether. To give confidence in this process, it is probably best if the application supplies all the contextual data for person X in a discrete set of files that person X has control of, and can thus delete or change access permissions (see, e.g., the system of Smailagic et al [6] for protecting location information, or the much more elaborate RBAC model for setting inter-object security constraints [2]). This may be done directly by the user or via agent programs. The user may also wish to encrypt some or all of the data, and control the way keys are issued to third parties.

For all this to work, the user would need to be satisfied that this was the only copy of the information, and that the information was not also stored in other places, such as caches and logs. This can be desperately hard to achieve in practice. The same applies to derived contextual data.

An alternative approach is for a user to switch off their personal contextual sensors -- in cases where they can: you cannot switch off a security camera. Switching off has great advantages if the user does not trust the application, but there are two dangers:

A third approach is to allow user's to change (or forge) their own past or present contextual data -- a strong version of the third privacy principle. This can be used to correct sensor errors or deductions from sensor values (derived context), but more importantly provides a solace to users worried about Big Brother. Thus if Big Brother is a manager who is looking at how much time X spends in the office, then X can forge his data to show he worked each evening. If an application uses this historical data, the forgery may, however, have unfortunate side-effects: if there is a really big attraction one evening, the application may not bother to inform the user of it, since he always works in the evening! There is also the possibility that Big Brother may have been at the office one evening, and knew that X was not there -- a further example of the use of multiple contextual sources in compromising privacy.

A fourth approach, who at least gives the user a chance of recourse if things go wrong, is to embed traceability into data. For example an image can have an electronic watermark that shows its source; if the image is then found in the hands of an unauthorized user, then it can hopefully be proved unequivocally where it came from.

Infrastructure

Users need to be able to express their privacy requests in an application-independent way, instead of using a different notation for each application. An important initiative to address this is W3C's Platform for Privacy Preferences (P3P) project http://www.w3.org/P3P. It is aimed to provide a simple automated way for users to gain control of personal information on web sites they visit. In the future this may become a model for privacy policy negotiation for all context-aware applications.

Understanding and correcting synthesized values

Assume that a user wishes to correct a synthesized value. There are two possible problems:

Some guiding principles for multi-user-context applications

It is valuable if multi-user-context applications embody some guiding principles on privacy, corresponding to generally accepted ethics. These should supplement the four principles quoted earlier. Some possibilities are:

Are the issues new ones?

Most of the privacy issues with context-aware applications are not new ones, but bring new worries:

As an example of an existing application, credit card companies have considerable data about the location and habits of their user's, particularly habitual users. In the past this has sometimes been abused (there were once stories in the British press, for example, about wine buying habits of a cabinet minister), but it is not a matter of great public concern. Similarly mobile phone companies have records of their user's locations (albeit only to within a hundred metres or so) and probably the content of their calls, too. Again this is not a current issue of great public concern.

Finally electronic doorlocks and cameras in offices provide a lot of information about users, as do outside cameras on roads or in public places. There is some opposition to these, but the balance of opinion appears to be strongly in favour of them, because of their success in crime detection and prevention.

Public perceptions

Public perception of what is a breach of personal privacy varies between countries; furthermore in any one country it changes over time. In western countries the trend has been for more and more concern over personal privacy, though after September 11 the pendulum has swung the other way. Overall therefore we are not dealing with absolutes, but more with varying public attitudes.

Even if an individual is not concerned with privacy, and is quite happy for his fellow citizens to know all about him, there is still a potential pitfall. Spam e-mailers use all the information they can glean from the internet, and the more they know about an individual the more they can send them targetted e-mail. This can be a deterrent to openness.

If data is stored on the computer, its use in most countries is controlled by the law. Typically the law involves the set of four principles described earlier. Of course a law or a code-of-conduct is useless unless there is some way of enforcing it. Data protection is especially hard to enforce in distributed multi-national applications. Moreover most of such laws were formulated without context-aware applications in mind; indeed when an official from a data protection authority recently visited a research laboratory concerned with context-aware applications, he was dumbfounded by the privacy implications of the technology, and the potential difficulties of legislation.

Privacy-threatening applications without software

To reinforce the point that privacy issues with context-aware applications are not new, consider what humans can do with their eyes. I can quite legally keep records, for every person I meet, of the time and place and the name of the person; indeed some people do this in their diaries, albeit in a less than complete way. I may possibly run into the law if I store these records on the computer. Moreover I may augment my eyes by wearing a continuously running video-camera, and could use this to record my life, and in particular my interactions with other people. Here the data would best be analysed using software, e.g. face-recognition software to identify people (from a known group) from the camera images. (This is an instance of synthesizing higher order data from low order data.) Although all this is far away from what people think of as context-aware applications, you could argue that it potentially impinges on the personal privacy of others just as much.

Conclusions

The privacy problems with context-aware applications arise mainly with multiple-user-context applications. This is an especially difficult case because the whole purpose of such an application is to share person information between users, i.e. it is by nature a counter-privacy application. Because of all the difficulties, we believe that users who release their personal data to such applications must treat such data as public. Although the application may be confined to a trusted community, each user can intentionally or indirectly (e.g. via an automatic log) save other people's personal data that they have accessed, and the chances of this trusted community each having perfect security to prevent information being passed on are small. Moreover derived context and derived behaviour may further compromise any privacy controls that have been set.

Users can and should be able to opt out of such applications whenever they wish. It is, however, unrealistic to expect retrospective opting out (i.e. requiring all past data to be destroyed) to work in multiple-user-context applications. Again the problem comes because there may be multiple copies of this past data, held by various users, and there may be data derived from it.

We have outlined various methods for protecting privacy. These may not be foolproof, but they at least prevent it from being too easy for an outsider to compromise privacy. We still believe that overall the negative conclusions of the previous two paragraphs hold. Thus an application should warn its users on what they are getting into, and applications will only attract users with an open-minded approach to privacy. It may even be best for an application designer to accept from the start that the application is only suitable for a certain class of user (e.g. for those people Alan Westin classifies as "privacy pragmatics" or "privacy unawares", but not for those people classified as "privacy fundamentalists" -- apparently 11% of the population); hence elaborate -- and often ultimately fruitless -- attempts to protect privacy may not be worth pursuing. Following on from this, application designers should think twice about producing applications that depend, for their viability, on, say, 90% acceptance level among the community.

Any context-aware application will, however, have some privacy protection. In implementation terms, protection cannot be added as an afterthought: instead it needs to be incorporated from the start into the design of the application.

References

  1. Ackerman, A., Darrel, T., & Weitzner, D.J. `Privacy in context', HCI, 16, 2, pp. 167-179, 2001.
  2. Barkley, J., Beznosov, K. and Uppal, J. `Supporting relationships in access control using role based access control', Fourth ACM Workshop on Role-Based Access Control, Fairfax, Va., pp. 55-65, 1999.
  3. Busboom, A. `Delivery context and privacy', Position paper for W3C Workshop on Delivery Context, Sophia-Antipolis, 2002.
  4. Harper, R.J. `Why do and don't people wear active badges': a case study', CSCW 4, 4, pp. 263-280, 1995.
  5. Langheinrich, M. `Privacy by design: principles of privacy-aware ubiquitous system', Tutorial notes from Ubicomp 2001, Atlanta, 2001; `Personal privacy in pervasive computing', Tutorial notes from Pervasive 2002, Zurich, 2002.
  6. Smailagic, A., Siewiorek, D.P., Anhalt, J., Kogan, D. and Yang Wang, `Location sensing in a context aware computing environment', Pervasive Computing, 2001.
  7. W3C, Platform for privacy prefences (P3P) project, http://www.w3.org/P3P, first released 1998.