CAR User Manual

Background information on CAR can be found in Context-aware retrieval (Brown, Jones). The CAR User Manual assumes the reader has a basic understanding of the principles of CAR, context matching, and the grammar of contexts and stick-e notes. An introduction to the latter two can be found in STICK-E NOTES: the Context Matcher User Manual (Brown).

Basic Terminology

A collection consists of >= 0 documents.

A document is either a stick-e note or a context. When either is in a collection of >1 documents then each document has a compulsory initial <note> field. This gives rise to a document sometimes being referred to as a note. A document consists of >=1 fields.

A field consists of a tag, an attributeTuple, and a valueTuple. A tag is unique within a document e.g. temperature.

An attributeTuple consists of zero or more name="value" pairs e.g. <temperature scale="centigrade">

A valueTuple consists of zero or more comma-separated values e.g. <location> 0291, 0284

A value is a literal expression of various forms, for example a string or numeric value e.g. 27.5 (a comprehensive specification is provided in STICK-E NOTES: the Context Matcher User Manual (Brown)).

A Session

What happens within a session is controlled by arbitrarily long input via standard input (stdin). For convenience of running sessions the input is provided in, and referred to hereafter as, the session file. The session file contains a single document collection, the grammar of which is identical to that of a context or note with the exception that (i) semicolons are used to terminate labeled values, and (ii) boolean literals, true and false, are allowed as values. The session file is validated as far as possible before it initiates the session proper.

There are various ways of initiating a session, one way is to use a the 'make' utility. An example Makefile is shown below.

Example Makefile

######################## CAR Makefile1 ####################

# #

# this Makefile: #

# - reads and runs a session file from stdIn #

# - it uses a JAR file of all CAR classes #

# - and a JAR file of Bali (for dynamic runs) #

# - the classpath is set at run time #

# #

# author: Lindsey Ford #

# #

###########################################################

PROJDIR = c:\Car

JAR = car.jar

JDKDIR = c:\jdk1.3

JAVA = $(JDKDIR)\bin\java

RUNCLASSPATH = $(PROJDIR)\System\$(JAR);$(PROJDIR)\System\bali.jar;$(JDKDIR)\jre\lib\rt.jar

go:

cd $(PROJDIR)

$(JAVA) -classpath $(RUNCLASSPATH) MainCar < Xamples\sessionTest1.txt

Thus the session is initiated by executing Java for the class file 'MainCar'. In the case above the Makefile is held in a directory/folder on the same level as 'Xamples' (from which the session file 'sessionTest1.txt' is read).

A session file consists of specifications of:

- Processes (which specify inputs and outputs),

- Parameters (which parameterize processes),

- Actions (which relate parameters to processes, and which cause processes to be activated).

Several example session files follow, the first simple one is below:

Example 1

input= external;

inputName= "c:\Car\Xamples\eContext1.txt";

output= internal;

outputName= eContext1

</process>

<action id=a1> processId=p1

The session file collection contains 1 document of, in this case, 2 fields (process and action). An arbitrary number of comments can be included in a document before, between, and after fields. The process field contains an attributeTuple of 2 pairs, and a valueTuple of 4 labeled values (a label is immediately followed by '=' and labelled values are separated by ';'). The action field contains a 1-tuple attributeTuple and, in this case, a single labeled value. Note that a field is optionally terminated by an end tag e.g. </process>.

The attributeTuple of a process field provides its identity (id=p1 or, more properly, id="p1") and its type, in this case a simple 'move' type. A process field specifies one or more inputs and one or more outputs.

The labels 'input' and 'output' are for comma-separated lists of values that provide for 3 types of specification: 'external' for data held in a file, 'internal' for data held in the program executing the session (i.e. in memory), and 'serial' for data held in a file in Java object format. By 'data' we mean a collection.

The labels 'inputName' and 'outputName' are for comma-separated lists of values that provide paths for filed collections and identifiers for internal collections.

The execution of a CAR session involves the execution of the action fields. These identify processes and link any parameter fields to them. In the above example there are no parameters, so the session would simply execute process p1. This is a 'move' process which in this case causes a collection held in a file to read in i.e. parsed and for it to be held internally under the name 'eContext1' … per se, not a very useful session). The different types of process are discussed later.

Example 2

input= external;

inputName= "c:\Car\Xamples\eContext1.txt";

output= internal;

outputName= "temp"

</process>

input= internal;

inputName= "temp";

output= external;

outputName= "c:\Car\Xamples\eContext1a.txt"

</process>

<action id=a2> processId=p2

<action id=a1> processId=p1

This example session:

- shows that a session can have multiple processes and actions

- reads an external collection and creates an internal collection of it

- moves the internal collection to an external file

- shows that the sequence of <process> and <action> elements is not important ...

the system works out any constraints (and in this case ensures that process p1 is

completed before process p2 is started).

Example 3

input= external;

inputName= "c:\Car\Xamples\eNotes1c.txt";

output= external;

outputName= "c:\Car\Xamples\eNotes1cUpdate.txt"

</process>

fieldName= any;

attributeName= score;

attributeValue= 0;

overwrite= true

</parameter>

processId= p1;

parameterId= param1

</action>

This example has a process type that requires a parameter. The parameter field has id and type attributes, and in this case the latter indicates that the parameter will cause attributes in each input document to be set appropriately (the details of 'setAttribute' are provided later).

It is the action field which links the parameter to the process.

Processes

A process is a specification of inputs and outputs to some action.

Valid input (and output) values are 'internal', 'serial', and 'external'. Valid inputName and outputName values are identifiers for internal data, and paths for serial and external text files. In each case by 'data' we mean the representation of a collection. Serial and internal representations are discussed later. The format of a text file is described in STICK-E NOTES: the Context Matcher User Manual (Brown).

There are five types of process:

1. move

2. documentUpdate

3. collectionUpdate

4. generalUpdate

5. match.

move

A 'move' merely moves inputs to outputs, and requires no parameters. There can be any number of inputs (the same number of outputs is required). When the input is 'external' the data file is parsed – warnings on standard output are given for any anomalies (see the note on duplication).

documentUpdate

In a documentUpodate process there is one input and one output. The input is a collection. The output is the same collection updated by one or more parameters. Each parameter is applied in turn to a single document. When all parameters have been processed for a document the updated documentis output and the next input document is presented for update.

The parameter types for documentUpdate are:

- setField

- deleteField

- setAttribute

- deleteAttribute

- library

- dynamic

Example 4 (documentUpdate: setField)

input=external;

inputName="c:\Car\Xamples\docs6.txt";

output=external;

outputName="c:\Car\Xamples\docs6Updated.txt"

</process>

fieldName=temperature;

fieldValue=27..29;

overwrite=true;

allDocuments=true

</parameter>

processId=p1;

parameterId=param1

</action>

The above example shows a setField parameter, which causes all documents in the collection with a 'temperature' fieldname to have the the range value '27..29'. The label 'overwrite' indicates whether to overwrite an existing temperature value, and 'allDocuments' indicates whether to insert the field in a document that doesn't have that field ('true') or not ('false'). It's possible to set any number of field values by including a parameter for each.

To delete a field the type is 'deleteField' and there is one lableled value 'fieldName'. Any document containing that field with the given name is deleted.

Example 5 (documentUpdate: setAttribute)

input= external;

inputName= "c:\Car\Xamples\eNotes1c.txt";

output= external;

outputName= "c:\Car\Xamples\eNotes1cUpdate.txt"

</process>

fieldName= ANY;

attributeName= score;

attributeValue= 0;

overwrite= true

</parameter>

processId= p1;

parameterId= param1

</action>

The example above shows a 'setAttribute'. All fields with a matching name will be given an attribute of 'score="0"' – provided the label 'overwrite' is 'true', otherwise a field already containing a 'score' attribute will not have its attribute overwritten. In this particular example the special fieldName 'ANY' matches all fieldnames … so for all documents each field will be given the new attribute.

To delete an attribute the type is 'deleteAttribute' and there two lableled values 'fieldName' and 'attributeName'. Any document containing a matching field will have its matching attribute deleted (if one exists for the field).

Example 6 (documentUpdate: library)

input= external;

inputName= "c:\Car\Xamples\docs9.txt";

output= external;

outputName= "c:\Car\Xamples\docs9Updated.txt"

</process>

class= CarLibraryA;

method= setScores,0;

method= setWeights,temperature,2;

method= celsiusToFahrenheit

</parameter>

fieldName= note;

attributeName= score

</parameter>

processId= p1;

parameterId= param1;

parameterId= param2

</action>

In the above example there are 2 documentUpdate parameters. The first is for a library. There are three libraries in CAR (CarLibraryA for documentUpdate, CarLibraryB for collectionUpdate, CarLibraryC for generalUpdate, and CarLibraryD for match). A library contains various parameterisable methods which can be invoked 'by name' at run time. Execution of such methods is slower than if they were implemented in CAR like the other parameters we have earlier seen.

So why have a library? As can be seen in the example above a library parameter provides a className and a method (consisting of a method name followed by any arguments required for the method). To implement another method is relatively simple, involving programming the method in the particular class and informing researchers of its availability (the mechanism for invoking such methods is general and requires no change). To provide within CAR for a parameter such as 'deleteAttribute' shown above involves considerably greater implementation effort, however. So during the research-intensive phase of CAR it's convenient to have libraries. When there is greater stability of requirement in CAR useful library methods will be 'hard-coded'.

In the library parameter above the first method 'setScores' will cause the attribute 'score="0"' to be inserted in each field (overwriting any existing attribute of the same name). It is the second value of the method label that denotes the score to be inserted.

The second method 'setWeights' will cause the attribute 'weight="2"' to be inserted in each field named temperature (overwriting any existing attribute of the same name).

For details of other CarLibraryA methods see the javadoc for CarLibraryA.

In the example above, the three methods then the deleteAttribute parameter are applied in sequence to each document.

Example 7 (documentUpdate: dynamic)

input= external;

inputName= "c:\Car\Xamples\docs9.txt";

output= external;

outputName= "c:\Car\Xamples\docs9Updated.txt"

</process>

program=

// this program, because it is a parameter to a documentUpdate process,

// is executed for each document;

// it gets a document, updates it as necessary, and terminates;

// CAR handles everything else

class DoScores {

static void main(String[] args) {

Document doc = Dynamic.getDocument(); // get the current document to be processed

// loop that processes each field ...

// sets a score attribute value, overwriting or inserting the attribute...

for(int numFields=doc.numberOfFields(), i=0; i<numFields; i++) {

Field field = doc.getField(i);

field.setAttribute("score","0.5");

}

</markup>

</parameter>

class= CarLibraryA;

method= setWeights,temperature,1.7;

</parameter>

processId= p1;

parameterId= param1;

parameterId= param2;

</action>

The above example has two parameters, so each parameter is applied in turn to each document in a collection.

The first parameter is type 'dynamic' which contains a single 'program' label whose value is the source of a program. When the session file is read in the program is compiled, compilation errors are reported at this stage, and held in memory. When the dynamic parameter is applied to a document the program is executed in entirety. The CAR Java class Dynamic provides an interface between the program and documents. See CAR Dynamic Interface for an explanation of examples and further details of the interface and the language that programs use.

collectionUpdate

In a collectionUpdate process there is one input and one output. The input is a collection. The output is the same collection updated by one or more parameters. But unlike a documentCollection process this process is applied uniformly across all documents in the collection for each parameter. When one parameter has been applied its collection output becomes the input for the next parameter, and so on.

There are 2 parameter types for collectionUpdate:

1. 1ibrary

2. dynamic

Example 8 (collectionUpdate: library)

input=external;

inputName="c:\Car\Xamples\eNotes1c.txt";

output=external;

outputName="c:\Car\Xamples\eNotes1dUpdate.txt"

</process>

class= CarLibraryB;

method= identify,LF;

method= delete,LF1

</parameter>

<action id=a3> processId=p1;

parameterId=param1

</action>

In the above example there is 1 collectionUpdate parameter and it is for a library, CarLibraryB.

In the library parameter above the first method 'identify' will cause the attribute 'id="In"' to be inserted in each note field of the input collection (overwriting any existing attribute of the same name), where 'I' is a prefix identifier (in this case 'LF') and 'n' is an ascending serial number starting at '0' for the first note. (The 'identify' method enables the user to cause each note to be uniquely identified).

The second method 'delete' will cause the document with note attribute 'id="LF1"' to be deleted.

In the example above, the two methods are applied in sequence to each document.

For details of other CarLibraryB methods see the javadoc for CarLibraryB

Example 9 (collectionUpdate: dynamic)

input= external;

inputName= "c:\Car\Xamples\docs9.txt";

output= external;

outputName= "c:\Car\Xamples\docs9Updated.txt"

</process>

program=

// this program puts an attribute "id=PJBn" into each note field,

// incrementing 'n' by 1 in the value "PJBn" and n starting from a given suffix

class Identify {

static void main(String[] args) {

String prefix = "PJB"; // prefix

int suffix=7; // initial suffix

Collection inDocs = Dynamic.getCollection(); // get the current collection to be processed

Collection outDocs = new Collection();

int numDocs = inDocs.numberOfDocuments();

Document doc;

for (int i=0; i<numDocs; i++) {

doc = inDocs.getDocumentCopy(i); // get the next input document

Field field = doc.getField("note");

if(field!=null) field.setAttribute("id",prefix+(suffix++));

outDocs.addDocument(doc); // add the updated document to the output collection }

Dynamic.putCollection(outDocs); // pass the new collection to Dynamic

}

</markup>

</parameter>

processId= p1;

parameterId= param1;

</action>

There is just one parameter, of type 'dynamic', which contains a single 'program' label whose value is the source of a program. When the dynamic parameter is applied to a collection the program is executed in entirety just once for the collection (the program has the responsibility of getting each document it wishes to process). This example is similar to example 8 – it 'identifies' each note in a collection – so in a sense it is redundant, but we provide it here anyway to show how dynamic parameters work. See CAR Dynamic Interface for an explanation of examples and further details of the interface and the language that programs use.

generalUpdate

In a generalUpdate process there is an arbitrary number of inputs and an arbitrary number of outputs (and not necessarily the same number of inputs as outputs). Each input is a collection and each output is a collection.

There are 2 parameter types for generalUpdate:

1. library

2. dynamic.

The library parameter type requires CarLibraryC. (An example is not provided here although one for concatenate is provided in the CAR Execution Modification document for the GeneralUpdateResearchLibrary - an equivalent of CarLibraryC.)

Example 10 (generalUpdate: dynamic)

<process id=p1 type=generalUpdate>
      input=            external,external;
      inputName=        "Xamples\context4.txt","Xamples\eNotes4.txt";
      output=           external;
      outputName=       "Xamples\eNotes4a.txt";
    </process>

    <parameter id=prog1 type=dynamic>
      program=
        <markup>
   // this program, because it is a parameter to a generalUpdate process, is executed just once;
   // it gets an array of input collections, processes them, outputs collections as necessary, and terminates;
   // CAR handles everything else

   class DoSomeUpdates {
     static void main(String[] args) {

     Collection[] inCollections = Dynamic.getCollections();    // get the collections to be processed
     Collection[] outCollections;

     System.out.println("do something...");

     Collection coll = new Collection();

     outCollections = new Collection[1];
     outCollections[0]=coll;

     coll.addElement(inCollections[0].getDocument(0));
     coll.addElement(inCollections[1].getDocument(3));

     System.out.println("DONE something");

     Dynamic.putCollections(outCollections);
     }
   }
</markup>
    </parameter>

    <action id=a1>
processId=        p1;
parameterId=      prog1;
    </action>

In this example there is just one parameter for the generalUpdate, of type dynamic. The program processes two input collections and outputs a single collection. The example is contrived and serves no useful purpose other than to show that the input and output collections are held in arrays (which allows CAR to be flexible about the number of inputs and outputs). See CAR Dynamic Interface for an explanation of examples and further details of the interface and the language that programs use.

match

In a match process there are two input collections and a single output collection. Matching is the process whereby a comparison is made between one or more contexts and a document collection, and those comparisons that are deemed important form the basis of output.

For a discussion of matching see STICK-E NOTES: the Context Matcher User Manual (Brown), also see Active fields and the rules for document matching which provides the rules for matching. A variety of other papers discuss particular sub-topics of matching in greater detail, these are contained in Peter Brown's collection of discussion and specification papers. Implementation details of matching can be found in the javadoc for Matcher.

There are 3 parameter types for match:

matchSpecA
library
dynamic

matchSpecA is compulsory – it specifies the tags to be matched, the type of match (interactive or proactive), a scoring threshold, and what is to be output.

The library parameter type requires CarLibraryD. (An example is not provided here although one for match is provided in the CAR Execution Modification document for the MatchResearchLibrary - an equivalent of CarLibraryD.)

Example 11 (match: matchSpecA)

input= external,external;

inputName= "Xamples\context4.txt","Xamples\eNotes4.txt";

output= external;

outputName= "Xamples\match4.txt";

</process>

activeTags= text,location;

threshold= 1.0;

scores= true;

currentContext= ACTIVE;

document= ANY;

direction= proactive;

</parameter>

processId= p1;

parameterId= m1

</action>

Note that in the process field the context collection input is specified before the document collection input. The parameter field specifies the active tags, a score threshold (comparisons yielding a score >= the threshold are deemed important), a 'whether to output score attributes in the output' specification (true or false), then two specifications for output (one for the current context, the other for the document collection), followed by a specification of the direction of matching (proactive or interactive). Full details of the specification can be found in The format of the retrieved document.

Example 12 (match: dynamic)

<!-- this session reads an external context and an external eNote document collection,

does a proactive match on them and puts the output in an external doc collection

-->

input= external,external;

inputName= "Xamples\context4.txt","Xamples\eNotes4.txt";

output= external;

outputName= "Xamples\match4.txt";

</process>

activeTags= text,location;

threshold= 1.0;

scores= true;

currentContext= ACTIVE;

document= ANY;

direction= interactive;

</parameter>

program=

// this program, because it is a parameter to a match process, is executed for each match;

// it gets a target document, a query document, an output document, updates it as necessary, and terminates;

// CAR handles everything else.

class MatchScorerA {

static boolean TRACE=false; // set to false if no trace messages required

//---------------------------------

static void t(String s) {if(TRACE) System.out.println("MatchScorerA- "+s);} // simple debugger

//-------------------------------

private static String stringScore(double d) {

// convert score to string, truncating scores ending .0

if(d==0.0) return "0";

String s = ""+d;

int i=s.indexOf(".");

if( (i + 3) < s.length() ) s = s.substring(0,i+3);

while(s.endsWith("0")) s = s.substring(0,s.length()-1);

if(s.endsWith(".")) s = s.substring(0,s.length()-1);

return s;

}

//---------------------------------

static void main(String[] args) {

Document target = Dynamic.getTargetDocument(); // get the current target document

Document query = Dynamic.getQueryDocument(); // get the current query document

Document doc = Dynamic.getOutputDocument(); // get the current document from the collection

double threshold = Dynamic.getThreshold(); // get input 'threshold' (default 0.0)

boolean scores = Dynamic.getScores(); // get input 'scores' (default true)

String[] activeFields = Dynamic.getActiveFields(); // get activeField names

t("\n\n\n\n\n================ new document ====================================");

t("target="+target);

t("query="+query);

t("outDoc="+doc);

t("scores="+scores);

Field tField,qField; // target and query fields

double fScore,noteScore=1.0; // field and note score

// remove all score attributes in output doc ... (the ones for activeFields are put in later)

doc.removeAttribute("score");

int numScores=0;

// match the two Notes

for(int j=0;j<target.size();j++) { // for each target field

tField = target.getField(j);

String tName=tField.getFieldName().toLowerCase(); // name of target field

t("tName="+tName);

if(tName.equals("note")) continue; // ignore note field for scoring purposes

if(!Utils.member(tName,activeFields)) continue; // ignore inactive fields in target

// got an active target field...

// process each matching queryField

// remembering there can be 'duplicates' i.e. same name different attributes

for (int qX=0, numQFields = query.numberOfFields(); qX < numQFields; qX++) {

qField=query.getField(qX);

if(!qField.getFieldName().equals(tName)) continue; // not the right query field

qField=query.getField(tName); // get the matching query field

fScore = tField.score(qField); // score the field

fScore = (double)Math.round(fScore * 100.0) / 100.0; // round to nearest 2nd dec place

noteScore *= fScore; // accumulate the note score

if(scores) // scores are required in output

// put score in output doc...

doc.getNamedFieldWithoutAttribute(tName,"score").setAttribute("score",stringScore(fScore));

numScores++;

t("numScores="+numScores);

t("noteScore="+noteScore);

} // end for each query field

} // end for each target field

// all target fields have been processed, calculate note score, check against threshold...

// noteScore is the geometric mean of scores

noteScore = Math.pow(noteScore,1.0/(double)numScores);

noteScore = (double)Math.round(noteScore * 100.0)/100.0; // round to nearest 2nd dec place

if(noteScore < threshold) Dynamic.deleteOutputDocument(); // not going to output the doc

else { // set the <note> score attribute

Field field = doc.getField("note");

if(scores) field.setAttribute("score",stringScore(noteScore));

}

} // end main()

} // end class definition

</markup>

</parameter>

processId= p1;

parameterId= m1;

parameterId= prog1;

</action>

This example shows two parameters, one for matchSpecA, the other for dynamic.

The dynamic parameter contains a program that, at the time of writing, is an implementation of the current matcher (the one that is invoked when there is no dynamic parameter e.g. as for Example 11). It thus serves to provide a basis for your own dynamic matching program. See CAR Dynamic Interface for a detailed explanation of this example and further details of the dynamic interface and the language that programs use.

Warnings and Errors

CAR parses the session file before proceeding and reports on standard output any warnings or errors. If the session file is parsed successfully warnings and errors may still occur during execution of Actions.

A warning allows processing to continue, an error causes the session to terminate. Errors are of two types: user and system. The former are generally caused by bad session file data, the latter by a CAR system fault … which should be reported to your supplier.

Sunday 25 November 2001

Contents

Example 7 (documentUpdate: dynamic)

generalUpdate