== Results for cache of size 39 and look ahead of 3 km. ==

Research issues in context-aware retrieval: evaluation of straying outside a context-aware cache

Peter Brown and Gareth Jones
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
P.J.Brown@ex.ac.uk, G.J.F.Jones@ex.ac.uk

Summary

This experiment measures recall degradation when the user's path of locations strays outside the area covered by a cache.

Fixed parameters of the experiment

Assumed purpose of cache:: to cover disconnected operation.
Contextual fields used:: just location.
Document collection:: items relating to tourist attractions in the South West; consists of 626 items; reasonably evenly distributed over the geographical area, but with some concentration on the towns and cities. Each attraction has a point as its location. All attractions are on land, and the border of the area covered consists of coastline, plus county boundaries between those counties deemed to be in the South West and those not.
How the cache was built: the cache for each test was built using a context consisting of a square, 20 km by 20 km, centred on the starting point for the test. We call this the cache-building square. This square set of locations was used as the current context, and matched against the location of each document in the document collection: the documents that matched, according to some threshold, formed the cache. The chosen threshold scores for building caches depended on the matching algorithm used; the aim was to get an average cache size for each test of about one fifteenth of the document collection. (In fact in each set of tests the lowest threshold used for retrieving from the cache was the same as the threshold used for building the cache.) The cache normally included a lot of tourist attractions outside the cache-building square, provided their locations were close to the square and thus got a good matching score. (With the thresholds used for building the cache, all the sites associated with locations within the cache-building square qualified, as did some outside; hence it made no different how big the cache-building square was: a size of 0 would give the same results, i.e. the same cache content. However for high thresholds the size would matter.) It was assumed that the user's device would be big enough to hold any of the caches generated (not an unrealistic assumption given the small side of the document collection, and hence the even smaller size of the caches).
Matching and scoring algorithms:: the basis for one set of tests is the default algorithms built into the Context Matcher. Scores for matching two locations decay roughly linearly as the locations move apart (actual algorithm: score = 2.0 * ((target - abs(target - query)) / target) ). A second set of tests were performed using the "research library" algorithms: these decay as N squared according to distance apart. (The "research library" is a Context Matcher facility for supplying new algorithms that override the default ones.) These algorithms generated much lower scores than the default ones (todo make scores higher and therefore a better basis for comparison?) and thus, to make the tests comparable, lower thresholds were set both for building the cache and for retrieving from it.

Experimental approach

The experiment consisted of a number of individual tests. Each test had a starting point, and the cache for the test was built using this starting point, i.e. it was the centre of the cache-building square. The starting points were chosen by hand, some being in the middle of popular towns and cities and some being `in the middle of nowhere'. The choice was based on hunch of where a tourist aid would be most used, not on the basis of any deep analysis.

In each test the user was assumed to proceed steadily in a straight-line path, performing a retrieval at regular retrieval points, a fixed distance apart. There were 11 retrieval points: the first was at the starting point of the test; the sixth was on the very edge of the square used to build the cache, and the remainder were increasingly further outside this area. (Actually the paths were diagonal ones, that went from the starting point to the NE corner of the square, and then an equal distance beyond.) The straight line paths were chosen so that the user remained in the overall area covered by the document collection, i.e. the South West of England (and did not stray into the sea!). The following picture shows how the user's path proceeds.
Picture of cache and user's progress

The retrieval at each of the 11 retrieval points used the cache; the results of this retrieval were compared with what would have been retrieved if the whole document collection had been used rather that the cache. Each retrieval retrieved all the documents whose score was greater than a given threshold (a document's score would not, of course, be affected by whether it came from the cache or from the original document collection). A count was made of the total number of documents "lost" over the 11 retrieval points (i.e. documents retrieved from the original document collection but not in the cache). We call these lost retrievals. A count was also made of the total number of documents retrieved from the cache.

Each test was repeated -- we call this a sequence of sub-tests -- with different thresholds (the same threshold was used for all 11 retrieval points).

Commentary on results

The first of the 11 retrieval points is at the starting point, and assuming this retrieval does not have a lower threshold than that used to build the cache, will have no failures -- i.e. everything that would have been retrieved from the original document collection is in the cache. The last of the 11 retrieval points is well outside the cache-building square, and this point is likely to have the most failures. Of the remaining points, the nearer they are to the start the less failures are likely. In the tests the best results are likely to be where there are plenty of attractions inside the cache-building square and many fewer just outside. Thus tests that started at the centre of cities had good results. Not surprisingly the worst results were from the test on Dartmoor, which started in a wild area with few attractions; its cache was therefore small, and its rate of failure turned out to be even worse than one would expect from the proportional size of the cache.

Tourists do not flock to the Somerset Levels yet surprisingly the test centred here (at the small town of Somerton) had the worst results, i.e. the most sites missed by the cache. It was especially proportionally worse than the other case for high retrieval thresholds (e.g at a 99% threshold, Plymouth had 26 hits and no losses, whereas Somerton had 6 hits and 2 losses). The direction of progress from the starting point took the user ever closer to more important tourist areas such as Wells, Bath and Glastonbury.

The number of retrievals of course increased as successively lower thresholds were used, but the proportion of lost retrievals increased quite sharply. This was surprising, though some increase would be expected. (To take an illustrative example if you are at the last retrieval point, about 14 kilometres outside the cache-building square, you might, with a low threshold, get sites 20 kilometres away, and these might be 34 kilometres outside the cache-building square; such far-away sites are unlikely to be in the cache, even though the cache is built with a low threshold.)

Todo: more refined conclusions.

Possible text for paper

We have performed some preliminary tests of context-aware caching. In order to reduce the number of variables we concentrated on one contextual field, location. The other contextual fields were kept constant during the experiments, and were not active in matching. Our document collection consisted of information about tourist sites, each of which had an associated location. We chose a set of different starting points, all well within the area covered by the document collection, and each representing a possible place a tourist might start wanting information. For each starting point, we built a context-aware cache that encompasses sites whose location matched a square centred on the starting point. We call this square the cache-building square. The cache-building square had sides of 20 kilometres long. We set a fairly generous threshold of 50% for inclusion in the cache, and as a result about one fifteenth of the documents in our collection went into the cache. (Many of these were outside the cache-building square, since locations outside, but close to, the square would still get a good score.) We assumed the cache was being used during disconnected operation, and thus there was no way of updating it.

Our tests of the cache were quite demanding: we assumed that the user went in a straight-line path, starting at the centre of the cache-building area, proceeding to the edge of the square, and continuing until he was an equal distance outside. We assume he made retrievals at 11 points equally spaced along the way. (Thus the first five points were inside the square, the next one was on the edge, and the remaining five were increasingly far outside.)

We counted the total number of documents retrieved at the 11 points. We then repeated each experiment using the original document collection rather than the cache, and again counted the number of documents retrieved. The difference between the two numbers represented the number of potential retrievals lost because of the use of the cache, i.e. the lost retrievals. We accumulated these numbers for all our starting points.

We repeated these experiments using different threshold scores (e.g. 99%, 98%. 95%, ...) for retrieval. The number of lost retrievals increased dramaticly as the threshold decreased. We expected some increase (a lax threshold would allow the retrieval of documents associated with locations a long way from the cache-building square) but were surprised at its magnitude.

As a final step we tried different algorithms for matching two locations: one algorithm decayed linearly according to the distance apart of the locations, and the other decayed as N squared.

Some preliminary conclusions are:

The proportion of lost retrievals rises dramatically as the threshold for retrieval is lowered (and thus sites further away from the cache area becomes candidates for retrieval). Obviously a rise is to be expected, but the dramatic scale of it is was unexpected (the individual tables, particularly those in Appendix B, illustrate this well). The rise is most dramatic with the N squared algorithm; this is because it is more focussed, and the caches are tightly focussed on the area covered (? how does this square with max. distance away, which is higher for the N squared algorithm -- perhaps because the cache size is slightly bigger?).
The tests were conducted so that that total number of lost retrievals was approximately the same for each algorithm. The results showed that the default algorithm did better than the N squared algorithm in that it had about 16% more successful retrievals for the same number of lost ones. Again this is probably because of the greater focus of the N squared algorithm.
The best results, in terms of a low number of lost retrievals, came from starting points in the middle of towns and cities. The document collection was such that there were more tourist sites in towns and cities than in rural areas, though rural areas were certainly not uncovered. As a user strayed outside their starting town or city there would be fewer potential tourist sites. Thus the early points of the user's path, which are well covered by the cache, would have a good number of hits, and the later points, poorly covered by the cache, would have a low number of potential hits. The worst results, not surprisingly, were from a starting point in a remote rural area, where the user's path went towards more populated areas.

APPENDIX A: results of individual tests using default algorithm and cache of size 39

Test with starting point at Dartmoor using default (linear) matching algorithm and cache size 39

Cache is built using a square with sides 20 km. User's path has steps of (2000, 2000) metres, with a total of 11 retrievals. Amount of look-ahead (both E and N) is 3km. Cache size is 39. (Size is set as a fixed value, derived from original size of 63).

Point furthest from centre of cache (25300000, 07300000) is 2330563, 0847571; distance away is: 23.1709 kilometres

*Results for Dartmoor (OS 250,070) using default (linear) algorithm and 3 look-ahead*
Threshold	Total benchmark retrievals	Retrievals lost
99%	5	2
98%	16	4
96%	51	16
93%	258	93

Default matching algorithm used was: (version pjb May 29 2002) .

Commentary on this starting point: the cache is centred on a wild moorland area.

Test with starting point at Exeter using default (linear) matching algorithm and cache size 39

Point furthest from centre of cache (29500000, 09500000) is 3096091, 0852581; distance away is: 17.5841 kilometres

*Results for Exeter (OS 292,092) using default (linear) algorithm and 3 look-ahead*
Threshold	Total benchmark retrievals	Retrievals lost
99%	28	1
98%	49	7
96%	159	20
93%	465	148

Default matching algorithm used was: (version pjb May 29 2002) .

Commentary on this starting point: this may be a favourable case as the cache is centred on an area with lots of attractions, and the area outside the cache has fewer attractions.

Test with starting point at Plymouth using default (linear) matching algorithm and cache size 39

Point furthest from centre of cache (25000000, 05600000) is 2377095, 0801320; distance away is: 27.0573 kilometres

*Results for Plymouth (OS 247,053) using default (linear) algorithm and 3 look-ahead*
Threshold	Total benchmark retrievals	Retrievals lost
99%	25	0
98%	49	0
96%	152	13
93%	376	107

Default matching algorithm used was: (version pjb May 29 2002) .

Commentary on this starting point: this may be a favourable case as the cache is centred on an area with lots of attractions, and the area outside the cache has fewer attractions; the cache may, however, be smaller as Plymouth is on the sea, so part of the cache area covers the sea -- which has no tourist attractions.

Test with starting point at Somerton using default (linear) matching algorithm and cache size 39

Point furthest from centre of cache (35300000, 13300000) is 3341510, 1329660; distance away is: 18.9003 kilometres

*Results for Somerton (OS 350,130) using default (linear) algorithm and 3 look-ahead*
Threshold	Total benchmark retrievals	Retrievals lost
99%	6	0
98%	25	3
96%	164	20
93%	483	183

Default matching algorithm used was: (version pjb May 29 2002) .

Commentary on this starting point: the cache is centred on non-prime tourist area, but there are prime areas nearby.

APPENDIX B: results of individual tests using research library algorithm and cache of size 39

Test with starting point at Dartmoor using research library (N-squared) matching algorithm and cache size 39

Point furthest from centre of cache (25300000, 07300000) is 2309563, 0594948; distance away is: 25.9494 kilometres

*Results for Dartmoor (OS 250,070) using research library (N-squared) algorithm and 3 look-ahead*
Threshold	Total benchmark retrievals	Retrievals lost
50%	0	0
20%	1	1
10%	13	2
5%	340	139

Non-default algorithm used was: Context Matcher with N-squared location-matching algorithm: version of Sept 4.1 2002.

Commentary on this starting point: the cache is centred on a wild moorland area.

Test with starting point at Exeter using research library (N-squared) matching algorithm and cache size 39

Point furthest from centre of cache (29500000, 09500000) is 2955010, 1129010; distance away is: 17.907 kilometres

*Results for Exeter (OS 292,092) using research library (N-squared) algorithm and 3 look-ahead*
Threshold	Total benchmark retrievals	Retrievals lost
50%	1	0
20%	10	0
10%	42	4
5%	534	192

Non-default algorithm used was: Context Matcher with N-squared location-matching algorithm: version of Sept 4.1 2002.

Commentary on this starting point: this may be a favourable case as the cache is centred on an area with lots of attractions, and the area outside the cache has fewer attractions.

Test with starting point at Plymouth using research library (N-squared) matching algorithm and cache size 39

Point furthest from centre of cache (25000000, 05600000) is 2733038, 0700543; distance away is: 27.1825 kilometres

*Results for Plymouth (OS 247,053) using research library (N-squared) algorithm and 3 look-ahead*
Threshold	Total benchmark retrievals	Retrievals lost
50%	0	0
20%	6	0
10%	41	0
5%	426	130

Non-default algorithm used was: Context Matcher with N-squared location-matching algorithm: version of Sept 4.1 2002.

Test with starting point at Somerton using research library (N-squared) matching algorithm and cache size 39

Point furthest from centre of cache (35300000, 13300000) is 3396450, 1186420; distance away is: 19.6703 kilometres

*Results for Somerton (OS 350,130) using research library (N-squared) algorithm and 3 look-ahead*
Threshold	Total benchmark retrievals	Retrievals lost
50%	0	0
20%	1	0
10%	13	2
5%	561	251

Non-default algorithm used was: Context Matcher with N-squared location-matching algorithm: version of Sept 4.1 2002.

Commentary on this starting point: the cache is centred on non-prime tourist area, but there are prime areas nearby.

APPENDIX C: summary for cache built with 20km square and 3km look-ahead and cache of size 39

RUNS USING DEFAULT (LINEAR) ALGORITHMS

Total number of tables is 4. Total size of the 4 caches is 156. Total number of retrievals (over the 11 retrieval points) from the cache is 2311. Total number of retrievals lost is 617.

RUNS USING N-SQUARED ALGORITHMS

Total number of tables is 4. Total size of the 4 caches is 156. Total number of retrievals (over the 11 retrieval points) from the cache is 1989. Total number of retrievals lost is 721.

APPENDIX D: summary of results for different amounts of look-ahead and cache of size 39

Total size of the 48 caches is 1872. Average cache size is 39. Maximum cache size is 39. Minimum cache size is 39.

*Results summary*
Algorithm	Look-ahead	Benchmark retrievals	Retrievals lost
default	0	2311	799
default	3	2311	617
default	5	2311	517
default	10	2311	415
default	15	2311	627
default	20	2311	1298
research	0	1989	832
research	3	1989	721
research	5	1989	648
research	10	1989	586
research	15	1989	686
research	20	1989	1095