[[page created automatically from word-processed document; for original see: Postscript version]]
Todo: <note> <A comment> (the idea of putting a special comment on the present context, which is carried over to the output.)
Peter Brown
February 21, 2001
ABSTRACT
This document explains the components of the current implementation of the Context Matcher and gives full details of the syntax used.
The purpose of the Context Matcher is to be a general engine that will can act as the core of a wide range of context-aware applications. The Context Matcher uses stick-e notes, which are the electronic equivalent of Post-it notes. A stick-e note consists of some information coupled with an electronic context. The Context Matcher (we will just call it the "Matcher" henceforth) has two key sets of data:
Given such data the Matcher can perform such operations as:
The original thinking behind the Matcher is described in the paper A General Mechanism for Context-aware Matching and Conversion. See (Link to paper) for more details. In particular the Matcher logically consists of a retrieval activity that finds out which notes best match.
This document explains how to use the current implementation of the Matcher. It has been implemented in Java. We start by explaining the comparison activity. Those readers, however, who learn best by example, might want to skip over the explanations and look ahead to the examples at the end of this document.
A unifying concept in the Matcher is that almost all its data is represented in the form of a context. Each stick-e note in the document collection is represented by a context, and so is the current context. (Although the current context has the same syntactic form as a stick-e note in the document collection, we use the term "stick-e note" exclusively to refer to a document in the collection.) We therefore start by describing how contexts are represented. A context is a set of fields, where each field consists of a name and a value-tuple, which consists of one or more individual values. Often the value-tuple is just a single value, in which case we just use the term "value". For input/output purposes a context is represented in a simple SGML form called a scontext, and has the name of each field enclosed between '<' and '>'. An example of a scontext (taken from the paper cited above) is:
<temperature> 23 <facing> 80..100 <with> printer_LP7 <location> 888, 999
The above scontext contains a <temperature>
field, with the value 23,
and a <facing>
field, showing the compass orientation where
the user is currently facing.
The notation for the value of <facing>
shows it to be a range: 80 to 100.
Thus a range consists of two numbers with ..
separating them.
(Currently ranges of strings or other non-numeric values are not supported.)
Ranges can be written in either order, e.g. 3..5
is the same range as 5..3
.
After the <facing>
field the scontext contains a <with>
field saying that the user
is near a certain printer, and a two-dimensional <location>
field.
A multi-dimensional value consists of a sequence of single values
(which might be ranges), separated by commas.
Thus an example of a value-tuple representing a three dimensional field is:
<readings> 34..56, true, -3..3
Ranges can be infinite, e.g.:
10..
means any number greater than or equal to 10.
.. 6.8
means any number less than or equal to 6.8 (including negative numbers).
..
means any number.
The name of a field must be a sequence of letters and/or digits, started by a letter.
The characters `_' and `-' are also allowed, apart from as the initial character.
Following the tradition of SGML, names are not case-sensitive, though all
other components of the Matcher are.
There are a few reserved field names -- see later; otherwise the Matcher
just accepts whatever field names it finds in its data -- there is no
concept of declaring them.
To allow for XML notation a field name can be written <
name />
, though this is not, in fact, correct SGML.
Names can be followed, in the normal SGML way, by attributes: values of attributes
should be enclosed within double-quotes, e.g. <N ATT="val">
.
Currently there are just two basic data types: numeric and string.
The Matcher can be configured either to use integers or to use real numbers.
(In earlier implementations, each name was associated with a unique data type,
i.e. if one <x>
field was a number, they all had to be.
However this turned out to be too restrictive and now a name can be
associated with any value.)
The data type of a field is determined by the appearance of the value: if the first character is a digit
(possibly preceded by a minus sign and or a decimal point) then the value is
treated as numeric (Todo: this is rather clumsy, but can we live with it? The
alternatives may be declarations of data types, etc.
Currently everything that is not a number is treated as a string.).
If it is required that a value be considered as a string it can be placed in quotes, e.g.
<CompanyName> "600 group"
SGML comments are allowed between or within fields, e.g.
<facing> <!--west--> 270
(As a safety measure for unmatched openers for comments, the Matcher
gives an error if the body of a comment includes --
characters.)
A comment is taken as part of the value of the field that precedes it, and
it is a good convention to write an explanatory after rather than
before the field it relates to.
The advantage is that the comment then appears when the value appears,
e.g. in error messages or in the final output; an example of a comment is
a place-name such as <!--Blean-->
placed after some co-ordinates representing the location of that place.
As with all SGML-based notations, whitespace is ignored.
Values must not normally include the reserved character `<'.
However there may often be a need to provide values that are in HTML, and
to cater for this, the Matcher provides the <markup>
tag.
An example of its use is:
<body> XXX <markup> <tag1> .. <tag2> ..</tag1> </markup> YYY
Here everything will be treated as the value of the <body>
field,
since the tags within markup
are not recognised by the Matcher.
<markup>
tags may not be nested within one another.
(PJB comment: markup tag is a low priority.)
Now deleted.
There are facilities for metavalues, which match any value.
There are two currently implemented metavalues: ANY
matches any value-tuple, whereas any
matches any single value; thus, for example, any,29,any
matches any
three-dimensional value-tuple that has 29
as its middle value.
ANY
must occur on its own as the sole value of the field; otherwise
an occurrence of ANY
is treated as an ordinary string.
Metavalues can also be used as tag names: <ANY>
matches any other tag
name.
There is no difference in meaning between <ANY>
and <any>
.
Being active to a stick-e note means taking part in the matching process.
The concept applies at two levels.
At the higher level a whole stick-e note may be active or not.
At the lower level individual field names may be specified as being active:
e.g. the name <facing>
may be specified
as an active field name; this means that if a note includes a field of that name then
the field will be part of the matching process.
(Often only a subset of the fields of a note are active in the matching process.)
An individual field is active if its name is active and its lies within an
active note.
Normally a note is active if and only if at least one field within it is active.
The purpose behind the idea of active fields is as follows. There may be a host of sensors that are contributing fields towards the current context, and at any one time only a subset of these are likely to be relevant to the user/application. Similarly the user/application may wish to focus triggering on certain active fields of notes ("I am not currently interested in information triggered by the temperature"). Removing a field from the list of active ones can achieve these effects. We explain later how the names of the active fields are specified.
A core operation of the Matcher is matching individual fields, and deriving an
overall score for a match.
A field of a stick-e note is only matched against the current context if it is active (or vice-versa if matching is interactive rather than proactive -- in the
rest of the discussion below we assume the proactive case).
Assuming it is,
a further prerequisite for two fields to match is that they have the same name (though the name can be lower-case
on one side and upper-case on the other) and that the two value-tuples have the same
data type.
Finally the values are compared to derive a matching-score.
All numbers are treated internally as ranges: thus the number 10
is
treated as the range 10..10
.
With this convention, two numeric values are scored according to how well
their ranges overlap.
String matching simply involves testing if one string is included in the other,
though later this might be made more sophisticated.
Two multi-dimensional value-tuples cannot match unless they have the same dimensionality.
Overall if a compulsory field does not match or gets a matching-score of 0, then the complete
match fails.
(PJB comment: labels are low priority.) An extra concept that affects matching is labels on fields. To understand the purpose of labels, consider the following situation.
There are three temperature sensors in a room, each forming part of
the current context.
One way of representing them is to use a different tag for each, e.g.
<temperature1>
, <temperature2>
and <temperature3>
.
A more flexible approach, which better captures the similarity between sensors, is to use the labelling facility. A label is prefixed to a field value, e.g.
<temperature> sensor1= 63 <temperature> sensor2= 64 <temperature> sensor3= 61
Almost any string of characters can be used as a label, though obviously a label cannot include an equals sign. The label must occur on the same line as the field name. (If there is a real equals sign in a field value, the value should be made to start on a new line, thus avoiding it being taken as part of a label delimiter.)
The above labelled temperatures give more flexibility in the matching process: essentially a stick-e note can be chosen to apply to any temperature sensor or to one particular sensor. In detail the matching process between two fields is as follows (we assume the fields match apart from possible labels):
<temperature> 63
would match any
labelled temperature setting.
<temperature> sensor1= 63
does not match
<temperature> sensor2= 63
.
Clearly this facility is a first step towards structuring of context fields.
An earlier implementation of the Matcher had a more advanced system of hierarchical
labels (e.g. the label person/engineer/Sue=
), but this has now been dropped.
In the future we may have available general context servers which offer
such structuring, and many other extra goodies besides.
Because of labels, the matching between a note and the current context is asymmetric, i.e. note A may match current context B, but if B is treated as a note it would not match A as the current context. Asymmetry can also arise in the matching of field values (though not in the current implementation). It arises, for example, if two values have a high matching-score only if the range of the first completely encompasses the range of the second.
So far we have described the fields within scontexts. Scontexts can also contain additional tags, which are called structural tags. Stick-e notes come in several types -- we discuss these later -- for example:
<note>
tag.
<activeTags>
tag.
A structural tag at the start of a note, e.g. a <note>
tag.
If scontexts are stored end-to-end within a file, rather than being within a database,
then these structural tags also serve to show where one note ends and the next begins.
Whenever an <activeTags>
scontext is
encountered it resets the properties of the matching process;
it does not form part of the stick-e notes or the current context.
PJB decision: previous idea of subcontexts is abandoned; it covered several facilities (conversion rules, re-setting fields, match reports), but all need to be rethought when we go from Boolean to best-match.
Structural tags have reserved names and cannot be used for field names.
They cannot have associated data values or labels.
The full list of structural tags is currently: <note>
,
<activeTags>
and <MatcherMessage>
(an error or warning note from
the Matcher: these are represented as scontexts and can be embedded in
other output data -- see later).
There is also <notes>
(at least for historical reasons -- covers
a document collection); PJB thought: omit this for the time being; we may need
to think about multiple document collections, each beginning with a <notes>
tag and perhaps some ID for each collection, but not yet.
Finally we need a way of resetting the switches on the call to the Matcher: if
there is a sequence of current contexts supplied, then switches (e.g. field weights) can be reset before each context in the sequence; one approach
is to have a <CM-switches>
tag, the other is to have a SWITCHES attribute
on any <notes>
tag that begins a new current context.
I suggest the latter.
Since it is messy to keep adding reserved names, I suggest we reserve <CM_...>
for future reserved tags.
The matching process compares each of the stick-e notes with the current context. This is called a matching pass. Each comparison of the current context with a stick-e note is done independently; the output is derived from the best matches, and is sorted so that the highest scoring match comes first. Matching passes are repeated periodically (e.g. every second, every hour, whenever anything changes) depending on the nature of the application.
All parameters of the matching process are global: i.e. during a matching pass each stick-e note is matched in the same way. Thus the activeTags specification is the same for all matches in a matching pass.
To be precise,
a match occurs if
all active fields in the note match corresponding fields in the current context
(remember that we are assuming the proactive case throughout this discussion).
The asymmetry we mentioned earlier in matching individual field values carries
over into the whole process.
As an example if <x>
and <y>
are active fields, and we have
the scontext (A) which is:
<x> 3 <y> 4
and a further scontext (B) which is:
<x> 3
then if (A) is the note and (B) the current context there is no
match, since the active <y>
field of the note is not matched.
If, however, the roles are reversed there is a match, since the only active
field in the note is <x>
, and this is matched.
One way of looking this is to regard each stick-e note as a query, whose active fields
interrogate the current context.
In the interactive case the roles are reversed. (PJB comment: currently controlled by a parameter of the Matcher; you can have a pipeline of Matchers with some proactive and some interactive.)
The currently active tags are controlled by the most recent occurrence
of an <activeTags>
scontext.
If no such setting has occurred, defaults apply.
A sample <activeTags>
scontext is
<activeTags> <location> ANY <temperature> 0..100 <facing> Jones= ANY
This is matched against each active stick-e note as a first stage, and the fields that match (with a matching-score
above some threshold) are
the active ones;
these are therefore the ones matched, as a second stage, against the current context to see if an overall match occurs.
As a detail, in the first stage, the <activeTags>
specification is treated
as a current context that is matched interactively, not proactively, against the
each of the stick-e notes, though this interactiveness only matters in
asymmetric cases, e.g. when labels are present.
The default settings of the active tags are:
<pressure>
field, all <pressure>
fields in notes are active.
(Although, as we have said, the stick-e note drives the matching process in
the proactive case, the
current context has in this case an influence on how the match is performed.)
For each active field the normal matching rules apply: thus the field will match if the labels (if any) and values match too; if they do not the overall match will fail.
If an <activeTags>
scontext is null, then it causes the above defaults to apply.
Often an <activeTags>
scontext is placed on the front of a set of notes as a suitable
default ("this city tour is based on location and time"), but these
defaults can be overridden by subsequent <activeTags>
scontexts generated by
the user/application (e.g. the user ask for an active tag to be switched off,
and as a result the application would pass a new setting of activeTags
to the Matcher).
The active tags come into force when <activeTags>
is parsed, which is prior
to when matching takes place.
Each setting overrides the previous one.
Closing tags (e.g. </note>
) may optionally be included, but are ignored (unless they are wrong and give rise to an error).
Nothing, apart from a possible comment, should come between a closing tag
and the next tag.
The following is an example:
<note> <y> 4 </y> </note>
At the end of an input file it is assumed that all tags are closed, i.e. nothing carries over to the next input file.
A null stick-e note or a null current context will never match.
Error messages from the Matcher are, by default, placed in the normal
output stream.
Each such message begins with <MatcherMessage>
and ends with
</MatcherMessage>
.
The messages usually come at the start of the output, since they arise during
preliminary parsing.
Often the Matcher is used in a pipeline, where the output from one usage
of the Matcher feeds into a second usage of the Matcher, and so on.
If the first produces an error message, the second will encounter the message
in its input.
There is a simple (crude?) mechanism to deal with this:
if the Matcher finds <MatcherMessage>
in its input, it copies the
entire message to its output, and otherwise ignores the message.
Thus all error messages that arise in a pipeline make it to the final
output.
(PJB comment: previously we had matchReports and recipes, which allowed the Matcher to deliver a description of the results of the match, and allowed the application to pick out fields of the matchReport in order to display them. The new scheme described here is hopefully (a) simpler and (b) better geared to best-match retrieval.) The Matcher delivers to the application the stick-e notes that best match the current context; this applies in both proactive and interactive cases. Each stick-e note will have a numerical matching-score, which describes how well it matched, and the Matcher will sort the delivered notes, ordered in descending scores. To save bandwidth the application may set a threshold matching-score, thus stopping the Matcher from delivering any stick-e note with a lower score.
In addition the Matcher needs to give information about the matching process. This information can be used by the application in deciding how to present information to the user, and it may also be useful for postprocessing and for debugging. It may also help the application to provide the user with facilities such as "Tell me the closest" (i.e. find, among the delivered stick-e notes, the ones that best match on location -- even though they might no have the best scores for overall match).
I suggest that matching information is conveyed by attributes, added by the Matcher
to the <note>
tag, and also to the tags for each of the fields that is
involved in the matching.
There are two such attributes: SCORE and AGAINST.
The AGAINST tag is used for fields and gives the value of the
corresponding field of the current context against which the stick-e note
field was matched.
(In many cases the AGAINST field could be deduced by the application, but it is
useful: (a) when there has been pre- or post- processing; (b) when there
have been time delays from when the original request was issued; (c) for
debugging.
An example of a matched stick-e note to which SCORE and AGAINST attributes
have been added is:
<note SCORE="1.4"> <author> ... <location score="1.6" AGAINST="23,45"> 23, 44..55 <temperature score=".8" AGAINST="16"> 20 <body> ... <openingHours> ...
(PJB comment: this issue has now come to the fore; the cited document represents old ideas; we are now working on new ones.) For information on how to detect change, e.g. changes in stick-e notes or change between one triggering and the next, see the separate document entitled "change". (The "change" document)
In particular this explains the <uniqueID>
field that is automatically
added to many contexts.
We now switch to more practical matters. We here give a brief explanation of how to run the Matcher as an ordinary programme. We here use the Unix command syntax. It is executed using the command:
matcher
switches arguments
There must be at least two arguments: the last is treated as the present context, and all the preceding ones as stick-e notes. (PJB comment: this is crude and a later implementation tried something better, using "stages"; it was more pipeline oriented.) Each argument may be a literal or a filename or a URL; switches can be used to specify which of these applies, but if there are no switches, the Matcher makes an intelligent guess. Each argument must represent a single scontext, or a sequence of scontexts written end to end. All or part of an argument may be devoted to setting the the active tags.
There may be switches at the start or between arguments. The allowable switches are:
The Matcher can be accessed as a web server as:
http://triton.ex.ac.uk/cgi-bin/cgiwrap/pjbrown/matcher
This server has the notes as the first argument and the current context as the second; it returns its output as plain text.
Some authors of SGML-based notations use attributes extensively, some very little.
Thus in our case we could have provided a host of possible attributes for the
<note>
tag, e.g. AUTHOR, TITLE, DATE, etc., or we could have taken
the view that if this information is needed it should be expressed as extra fields
within the <note>
, e.g. an <author>
field.
Overall we currently favour the latter: a consequence of
the approach that the Matcher has no prior knowledge
of what field names are used is that the user/author can
add new fields painlessly.
In a previous version of the Matcher attributes were used on fields to indicate the
type of the value, e.g. whether a location was GPS co-ordinates, OS grid
co-ordinates, etc.
We may need to add such attributes in the future, but will do without them at present.
We will assume therefore that the data type of all values is manifest from the value itself.
Below we describe the few attributes that we do use.
Following the normal SGML rules, a tag can have a set of possible attributes associated with it. All the attributes used here are optional (i.e. "implied" in SGML parlance): they can occur 0 or 1 time on each tag. If a tag has two separate attributes, e.g. a SCORE attribute and an AGAINST attribute, these can occur in either order.
The following is a list of attributes that are used:
<notes>
tag, where
it gives the document score, N.
N must be a real number within the range allowed for scores.
This attribute can also occur on any field tag used in the matching process.
The attribute is inserted by the Matcher and added to the relevant tags in each
document it retrieves and delivers to the user.
The attribute can also occur within documents from the document collection that are
input to the Matcher (after all, the output from one instantiation of the
Matcher can be the input to a subsequent one, e.g. when a proactive retrieval
is followed by an interactive one);
in this case the value N is used as the initial value of the score.
(We still need to decide how initial values are used by the scoring algorithms;
maybe the default algorithms will multiply the newly calculated score by
the initial score; algorithms plugged in by the user can use the initial score as
they wish -- it therefore should be accessible.)
The SCORE attribute should not be used within the current context
(there is an error message if this rule is broken).
<note>
tag.
V gives the value of the field in the current context that was
matched against the current field to yield the SCORE.
The AGAINST tag is ignored on input; it should not occur at all in the
current context.
<note>
tag that
begins a current context.
S follows exactly the same syntax as the switches on the Unix command to launch
the Matcher (see later); e.g.
note SWITCHES="-w at:4 -r"> ...
There may be some restrictions on what switches can be reset dynamically.
(PJB comment: these date from Boolean matching and need revising.) Many people learn better from examples than from reading lots of text, so we finish by giving a series of examples. We do not here go into the exact scoring mechanisms used for matches, and just assume the matched stick-e notes are output in the order they occur.
The first set of examples assumes that the stick-e notes are:
<note> <title> note 1 <at> 1..6, 7..12 <body> Cathedral <note> <title> note 2 <at> 6..9, 12..14 <body> Chapter House <note> <title> note 3 <at> 1..100, 0..200 <temperature> 70.. <!-- Fahrenheit used --> <body> Swim in the river
(Obviously this is an abbreviated form of what real notes are likely to be.)
It is a useful convention, followed above, to put <title>
fields
in each stick-e note, which, though not designed to take part in
the matching process, help keep tabs on what is going on.
We first assume the current context:
<!-- example n1 --> <at> 1, 10
If we assume the default activeTags are in place, then the <at>
field
is the only active one; this applies to all three notes, since, in each case, it
is the only tag in both the note and the current context.
Given this, then two of the above three notes are matched.
(The location on the current context comes with the range specified on two
of the stick-e notes, but is outside on the third case; we assume the
former are treated as matches.)
A common form of post-processor is one that extracts from the stick-e notes
the fields that were not involved in the matching: these fields
are likely to be information fields that are of interest to the user.
If we assume that there is such a post-processor, then the output produced is:
<title> note 1 <body> Cathedral <title> note 3 <temperature> 70.. <!-- Fahrenheit used --> <body> Swim in the river
Actually it is best practice to set explicitly the activeTags. We will do this, and try a different current context:
<!-- example n2 --> <activeTags> <at> ANY </activeTags> <at> 6, 12
In this example, the location is within the range specified on each of the three stick-e notes, and we therefore assume that all count as matches. The output will be:
<title> note 1 <body> Cathedral <title> note 2 <body> Chapter House <title> note 3 <temperature> 70.. <!-- Fahrenheit used --> <body> Swim in the river
We will now introduce a current context with two fields, and make both active:
<!-- example n3 --> <activeTags> <at> ANY <temperature> ANY </activeTags> <at> ANY <temperature> 32
This will produce the output
<title> note 1 <body>Cathedral <title> note 2 <body>Chapter House
The note that requires a temperature of 70 or above is, we assume, not retrieved,
as it will get a low matching-score with the current temperature at 32.
If, however, we made <temperature>
the only active tag by means of:
<!-- example n4 --> <activeTags> <temperature> ANY </activeTags> <at> ANY <temperature> 32
then, since the note with a <temperature>
field
gets a very low matching-score, and since the other two notes are not active since
they do not contain an active field, nothing will be retrieved (unless,
of course, the application had an extremely low threshold score for retrieval).
Finally if we revisit example n1 above and assume all the fields are output we might see:
<note SCORE="1.8"> <title> note 1 <at SCORE="1.8" AGAINST="1, 10"> 1..6, 7..12 <body> Cathedral <note SCORE="1.7"> <title> note 3 <at SCORE="1.7" AGAINST="1, 10"> 1..100, 0..200 <temperature> 70.. <!-- Fahrenheit used --> <body> Swim in the river
(We assume the first note gets a higher matching-score because it covers a smaller area, and because one of the user's location co-ordinates is central to this area.)
In the current implementation, the Matcher is driven by a command line whose arguments specify all the data to be used (stick-e notes, current context, switches, etc.), though you could have a test harness which, e.g., invoked the Matcher once a minute. In the current implementation, if you restart the Matcher you have to re-load all the stick-e notes again; this is very limiting, especially in the test harness case. I think the new implementation should allow users to create a process consisting of the Matcher with a given set of stick-e notes loaded, and then call this process, supplying a different current context with each call. Alternatively we could have a web server, which had a set of stick-e notes pre-loaded; each user would supply their current context, and retrieve the notes that were relevant to them.
Following on from this, if you have pre-loaded stick-e notes, it would be desirable to allow updates on-the-fly, without a complete re-load. For example we might want to add new stick-e notes (relating, say, to a sudden traffic problem) and/or to delete existing ones. Likewise we might want to change the switches, e.g. to be interactive rather than proactive.
In the short/medium term we need to have some default scoring algorithms when matching two field values, which we here call the query field value and the target field value. Some suggestions are below. We assume numeric values are ranges; we also refer to the centre of the range; if the value is a single point the range and the centre of the range will be one and the same. The assumption that values are ranges means we are assuming 2D areas are rectangles. The following suggestions cover numeric values:
ANY
would get a low matching-score.)
The following suggestions cover string matching (testing if the target value contains the query value):
The following suggestions cover value-tuples:
23,YES,10
) should be the mean (arithmetic or geometric?) of
the scores of the individual values.
The algorithm for deriving an overall score for a note will almost certainly be a weighted mean of the field scores, with the convention that a compulsory field that does not match or gets a score of 0 signals a complete non-match and therefore sets the overall matching-score to 0 (this happens anyway if you use geometric means). It seems obvious that by default all the weights of individual fields should be equal: only the application (and/or pre- and post-processors) will have the knowledge and experience to change the defaults (e.g. by noting rarely matched fields and weighting them more highly). It also seems that N+1 fields matching, each with an average matching-score, is better than N fields matching, each with this same average matching-score. Perhaps the best approach is a weighted arithmetic mean where, instead of diving by N at the end, you divide by (say) N-1/2.
In various implementations I have continually wavered between (a) having
both compulsory and optional fields, and (b) just having compulsory fields.
A key property of an optional field is that if it fails to match (e.g. does not
exist in the target or gets a matching-score of 0) it does not knock out the whole match.
If you only have (b) you cannot completely simulate (a) using scoring algorithms,
and default values for missing fields (e.g. if you have no location sensor you
set your location to ANY
and give location a low field-weight), but
you can go some way.
With this in mind do we adopt approach (b) for the short term?
Parameters of the matching process can either be set globally by
switches on the command that launches the Matcher, or they can be set
within the data.
The only example of the latter is currently the <activeTags>
facility.
We need ways of setting the scoring algorithms and the field-weights.
I think we are moving towards the algorithms being pieces of Java code
that are incorporated before the Matcher starts; however the user
may supply several alternative algorithms and may wish to place a
switch-value within the data to select dynamically the algorithm to
be used for each set of test data.
Field-weights can more easily be supplied as dynamic values, but it may also
be desirable to supply them on the command-line.
Can I suggest the fully overall scheme:
<parameters> XXX
within the data (e.g. before a new setting of the current context) and the syntax of XXX exactly follows the syntax of the command line. (I do not think it matters too much if the notation follows a typically terse and somewhat unfriendly UNIX style.)
An example of a parameter setting might be
<parameters> -w at:2;temperature:.6 -s at:4 -r
This might (a) set the field-weight of <at>
fields to 2 and
<temperature>
fields to .6; (b) set the scoring algorithm for
<at>
fields to algorithm number 4 of the ones supplied;
(c) set the -r parameter so that subsequent matches were done
in reverse (i.e. interactively rather than proactively).