Peter Brown
Department of Computer Science, University of Exeter, Exeter EX4 4QF, UK
P.J.Brown@ex.ac.uk
ABSTRACT
The world of databases and the world of retrieval are unfortunately almost disjoint. When implementing a CAR system we need to choose which world to live in. If the underlying data is structured a database is certainly a possibility, and many CAR systems use them. There may, however, be important areas where databases do not meet CAR needs.
We list here some issues that might prevent us using a database:
One way of adapting databases to meet some of our needs is to have a fuzzy matching front-end. A few commercial products support fuzzy matching as a front-end to any standard SQL database. My guess is that they work as follows. Assume the requirement is to match Location L and Temperature T. The fuzzy-matcher first tries to find exact matches, and, if it finds any, gives them top matching-scores. It then tries "near-miss" matches, e.g. a temperature between T-1 and T+1, then a temperature between T-3 and T+3. Each is given a matching-score according to how near the miss. Scores of individual fields are accumulated (and weighted, if desired) to give an overall matching-score for a database record.
This approach has been used by Rhodes, using the Fuzzy-Matcher product, marketed by Sonalysts Inc. (?but do they still support it?). Rhodes used a logarithmic scale. Fuzzy queries is also discussed in a more general paper by Teska.
We are researchers, and want to investigate many possibilities, unconstrained as far as possible by the tools we use. For this reason, I think our current approach of implementing our own matching engine, and living outside the world of databases, is the best one.