Peter Brown and Gareth Jones
Department of Computer Science, University of Exeter, Exeter EX4 4PT,
UK
P.J.Brown@ex.ac.uk, G.J.F.Jones@ex.ac.uk
This experiment measures recall degradation when the user's path of locations strays outside the area covered by a cache.
The experiment consisted of a number of individual tests. Each test had a starting point, and the cache for the test was built using this starting point. The starting points were chosen by hand, some being in the middle of popular towns and cities and some being `in the middle of nowhere'. The choice was based on hunch of where a tourist aid would be most used, not on the basis of any deep analysis.
In each test the user was assumed to proceed steadily in a straight-line path, performing a retrieval at regular retrieval points, a fixed distance apart. There were 11 retrieval points: the first was at the starting point of the test; the sixth was on the very edge of the area covered by the cache, and the remainder were increasingly further outside this area. The straight line paths were chosen so that the user remained in the overall area covered by the document collection, i.e. the South West of England (and did not stray into the sea!). Todo: add picture of cache area and retrieval points.
Each retrieval used the cache; the results of this retrieval were compared with what would have been retrieved if the whole document collection had been used tather that the cache. Each retrieval retrieved all the documents whose score was greater than a given threshold (a document's score would, not of course, be affected by whether it came from the cache or from the original document collection). A count was made of the total number of documents "lost" over the 11 retrieval points (i.e. documents retrieved from the original document collection and not the cache), and of the total number of documents retrieved from the cache.
Each test was repeated -- we call this a sequence of sub-tests -- with different threshold (the same threshold was used for all 11 retrievals).
Threshold | Total benchmark retrievals | Retrievals lost |
---|---|---|
99% | 2 | 2 |
98% | 10 | 6 |
96% | 35 | 20 |
95% | 66 | 43 |
Commentary on this starting point: the cache is centred on a wild moorland area.
Threshold | Total benchmark retrievals | Retrievals lost |
---|---|---|
99% | 25 | 2 |
98% | 40 | 8 |
96% | 148 | 59 |
95% | 213 | 98 |
Commentary on this starting point: this may be a favourable case as the cache is centred on an area with lots of attractions, and the area outside the cache has fewer attractions.
Threshold | Total benchmark retrievals | Retrievals lost |
---|---|---|
99% | 16 | 0 |
98% | 29 | 2 |
96% | 97 | 33 |
95% | 148 | 61 |
Commentary on this starting point: this may be a favourable case as the cache is centred on an area with lots of attractions, and the area outside the cache has fewer attractions.
Threshold | Total benchmark retrievals | Retrievals lost |
---|---|---|
99% | 11 | 4 |
98% | 63 | 24 |
96% | 243 | 112 |
95% | 342 | 176 |
Commentary on this starting point: the cache is centred on non-prime tourist area.