gndvis -- Visualising many-objective individuals

pics/RADAR-200sol-obj-ddmds.png

Installation

gndvis is most easily installed by downloading the tarball, unpacking it and using python to install it:

$ tar zxf gndvis-1.0.tar.gz
$ python setup.py install

On UNIX systems, use sudo for the latter command if you need to install the scripts to a directory that requires root privileges:

$ sudo python setup.py install

This will install the gndvis script in $prefix/bin or Python2xScripts and will install a egg and ndvis package in your site-packages directory.

You will need Python (of course), numpy, scipy and Matplotlib all of which are available from a good repo near you. The 'Enthought Python Distribution' EPD gets them all in one go.

Papers and theory

For an introduction to some of the ideas and examples, see the following, drafts of which are part of the distribution:

Using gndvis

The main utility of the tool is to explore a dataset comprising several individuals (members of an evolutionary population, universities in a league table, etc) each of which has the goal of minimising its score on a number of objectives. Visualisation is achieved by projecting the data onto the plane using either a radial visualisation or by defining distances between the individuals based on their dominance structure and then using metric-multidimensional scaling to project onto the plane. However, there are quite a few command-line options to describe how the data are read.

You can get a summary of the command-line options with:

$ gndvis --help

At its simplest, the data are stored as rows in a text file, one individual per row (see for example, RADAR-200sol-obj.dat, oct3.dat or oct5.dat in the data directory). In this case, start the tool with:

$ gndvis [-ddmds] oct5.dat

which will produce the radial visualisation of the data in oct5.dat, which are 5-objective data arranged over a shell of radius 1 in the positive orthant. Some details of the radial and MDS visualisations are as follows.

Radial visualisation

In this case the looks like

pics/oct5rviz.png

The individual points are coloured according to the one of the colouring schemes that can be selected in the righthand panel.

  • Rho: Distance to the simplex
  • Perp dist: Perpendicular distance to the simplex (proportional to rho)
  • Column: One of the columns in the data file
  • Average rank: the average rank of each individual. Minimum rank (best) is zero.
  • Power index: a more sophisticated version of the average rank that gives extra credit for the individuals that a point dominates.
  • Preference score: as defined by F. di Pierro, S. T. Khu, and D. Savic, "An investigation on preference ordering ranking scheme in multiobjective evolutionary optimization", IEEE Trans. on Evolutionary Comp., vol. 11, no. 1, pp. 17–45, 2007. The "Combs of" button controls which combinations of dimensions are used to calculate the score.
  • Dominates: the number of individuals a point dominates (also affected by the "Combs of" button).
  • Dominated: the number of individuals the point is dominated by (also affected by the "Combs of" button).

Mousing over particular points will show the point's index in the data file (or name if a file of names is supplied) together with the actual values of its objectives and the value of quantity is being used to colour the points.

The usual Matplotlib controls are available for zooming in, saving the image, etc.

The order in which the objectives are mapped onto the vertices of the projection polygon is controlled by the "Permutations" panel. You can step through the many permutations one by one with the "Next" button or, more usefully, press the "Seriate" button to place similar objectives together.

The "Lambda" panel controls the placement of the simplex onto which the data is projected. It's highly recommended to use "rank coordinates" (the default) and set all the lambda equal.

The overall size of the projection within the polygon can be controlled the "Scale" panel. If the "Nbrs" button is selected, lines are drawn between points that are nearest neighbours in the original space.

Ticking the "Extrema" box indicates the best and worst points for each objective in red (worst) and blue (best).

It may be helpful to label the vertices with more meaningful names. This can be done with --labels=label1,label2 etc. If --labels=Headers is specified, labels are taken from the first row of a spreadsheet or CSV file. If explicit labels are not given, the vertices can be labelled from 1 instead of the default 0 with --label-from-1.

Dominated distance MDS

The dominated distance MDS visualisation is selected with the --ddmds option, for example:

$ gndvis -D8 Times2009+GUGrank.dat --names=Times2009-abbrev.txt -D8 --ddmds

This produces a visualisation of the first 8 columns (-D8) of the data in Times2009+GUGrank.dat

pics/Times2009-ddmds.png

The data are Times Good University Guide Scores for 2008 (published in 2009) converted so that a low value corresponds to a good score. Missing data is signalled by "nan" in the data file and a value is imputed. The 9th column in the data file is available for colouring the points via the "Column" selection in the "Colour" panel.

The --names=file argument specifies a file of names for each individual (universities here) which are displayed when mousing over a point.

Colouring by the methods mentioned for the radial visualisation, scaling, nearest neighbours and extrema labelling are available, but the permutation does not make sense in this context.

Also available are tick boxes to control which Pareto shells are plotted and MDS calculated from. Deselecting a Pareto shell inhibits the display of individuals in that shell. Deselecting a shell from the colour calculation stops it being used in the calculation of the colours (in which case it can't be displayed). If individuals in a shell are deselected from the MDS calculation but still displayed, they are projected into the MDS space, but not used in the determination of the space.

The display and colouring selection is also available using the radial visualisation, but it is probably best to visualise a single Pareto shell at a time with that method.

Ticking "minimum" in the "Axes" box, displays the projection of the axes of the axis-aligned bounding box of the data that meet at the "minimum" (globally best) corner. Likewise, for "maximum" which displays the projection of the axes which meet at the nadir point of the bounding box. Ticking the "Diagonal" displays the projection of the diagonal of the bounding box.

Command-line arguments

Other command line arguments mostly deal with reading data in different formats. Specifying an additional file of names (--names=Times2009-abbrev.txt) is cumbersome and error prone; likewise specifying the label names with --labels=label1,label2. If the data are stored in an (Excel) spreadsheet or CSV file (detected from the data file extension), the objective labels can be read from the first row by specifying --labels=Headers. The rows and columns to read can be specified with the --rows=ROWS and --columns=COLUMNS arguments; by default all rows and columns are read. ROWS and COLUMNS are interpreted as Python slices and indices start at 0. In this case the argument to --names is interpreted as a column in the CSV or spreadsheet which contains the names for each row. A couple of examples:

$ gndvis --columns=1:9 --names=0 --labels=Headers --max=0,1,3,4,5,6,7 data/tlt_2011.csv

which produces

pics/tlt_2011-rviz.png

Here the 8-objective data is found in columns 1 to 9 of the CSV file containing university league table data for 2011; university names are in column 0 and objective names are in row 0. The --max=0,1,3,4,5,6,7 argument indicates that each of listed objectives (ie, all except column 2) is better for larger values: these objectives are converted to rank order and the order reversed so that they can be visualised as a minimisation problem. Note that there are some missing items in these data for which values are imputed.

Here is the same data read from an Excel spreadsheet and displayed using dominance distance MDS:

$ gndvis --columns=1:9 --names=0  data/tlt_2011.xls --max=0,1,3,4,5,6,7  --labels=Headers --ddmds

which gives

pics/tlt_2011-ddmds.png

Bugs

This is research quality code, but if you find a bug (and especially if you have a fix for it) please email the author at <R.M.Everson@exeter.ac.uk>.

Licence

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.