Next: Further reading for this
Up: Introduction
Previous: Some fundamental concepts
Contents
The development of computer technology since the 1950s
has led to the creation of many very useful statistical software
packages for analysing data.
Off-the-shelf statistical software now makes it
possible to perform analyses on a personal computer that would have been completely
impossible in the pre-computer era.
For this reason, computational statistics is now a large
and rapidly advancing branch of modern statistics.
Many diverse statistical software packages are currently
available that offer a wide variety of capabilities.
They can be broadly classified into three main categories:
- Powerful language-based packages
For example, S-PLUS, R, and SAS, which are packages that allow the user
to develop their own statistical macros and functions in addition to
the comprehensive range of statistical routines available. These
powerful language-based packages are used by many practising statisticians.
They are not particularly user-friendly but once mastered can be extremely
powerful tools. The R software is freely available to run on many different
computer platforms from www.r-project.org.
- Interactive packages
For example, MINITAB and SPSS, which are packages that allow the user
to perform many standard statistical operations at the click of a mouse.
These are quick and easy to use and are useful for applying standard
methods but not ideally suited for developing new functions. A big danger with such packages
is that the user can easily perform operations that they do not understand.
This can create a ``black box'' view of statistical methods that
often leads to poor interpretations.
- Packages with statistical libraries
For example, MATLAB and PV-Wave/IDL, which are primarily
data analysis and visualization
programs/languages that also include libraries including statistical functions.
These packages can be useful in climate analysis since they can cope with the large
gridded data sets quite easily and can also be used to
quickly visualise spatial data.
A problem with these packages is that the libraries often
contain only a subset
of standard statistical functions, and do not benefit from input from
professional statisticians. This is particularly the case with certain spreadsheet
packages such as EXCEL that contain rather idiosyncratic and poorly developed
statistical libraries.
- Home made subroutines
Many climate researchers have a bad habit of doing statistics
using subroutines in Fortran that they have either written by
themselves, obtained from a friend, or copied from numerical recipes.
This Do-It-Yourself cookbook approach has several disadvantages
that include time being wasted reinventing the wheel programming
routines rather than time being spent thinking about the appropriate
choice of method etc., and lack of any input or contact with
professional statisticians.
The lack of statistical input can lead to ignorance about the range
of possible methods available, and the problems associated with
the different methods.
Just as good surgeons make use of
the best professional instruments for their work rather
than using just a few home made tools, so one should expect
scientists to use the best data analysis software at their
disposal rather than something they just hacked together.
Good analysis requires the expert use of good tools.
Next: Further reading for this
Up: Introduction
Previous: Some fundamental concepts
Contents
David Stephenson
2005-09-30