next up previous contents
Next: Further reading for this Up: Introduction Previous: Some fundamental concepts   Contents

Statistical software

The development of computer technology since the 1950s has led to the creation of many very useful statistical software packages for analysing data. Off-the-shelf statistical software now makes it possible to perform analyses on a personal computer that would have been completely impossible in the pre-computer era. For this reason, computational statistics is now a large and rapidly advancing branch of modern statistics. Many diverse statistical software packages are currently available that offer a wide variety of capabilities. They can be broadly classified into three main categories:

  1. Powerful language-based packages
    For example, S-PLUS, R, and SAS, which are packages that allow the user to develop their own statistical macros and functions in addition to the comprehensive range of statistical routines available. These powerful language-based packages are used by many practising statisticians. They are not particularly user-friendly but once mastered can be extremely powerful tools. The R software is freely available to run on many different computer platforms from www.r-project.org.

  2. Interactive packages
    For example, MINITAB and SPSS, which are packages that allow the user to perform many standard statistical operations at the click of a mouse. These are quick and easy to use and are useful for applying standard methods but not ideally suited for developing new functions. A big danger with such packages is that the user can easily perform operations that they do not understand. This can create a ``black box'' view of statistical methods that often leads to poor interpretations.

  3. Packages with statistical libraries
    For example, MATLAB and PV-Wave/IDL, which are primarily data analysis and visualization programs/languages that also include libraries including statistical functions. These packages can be useful in climate analysis since they can cope with the large gridded data sets quite easily and can also be used to quickly visualise spatial data. A problem with these packages is that the libraries often contain only a subset of standard statistical functions, and do not benefit from input from professional statisticians. This is particularly the case with certain spreadsheet packages such as EXCEL that contain rather idiosyncratic and poorly developed statistical libraries.

  4. Home made subroutines
    Many climate researchers have a bad habit of doing statistics using subroutines in Fortran that they have either written by themselves, obtained from a friend, or copied from numerical recipes. This Do-It-Yourself cookbook approach has several disadvantages that include time being wasted reinventing the wheel programming routines rather than time being spent thinking about the appropriate choice of method etc., and lack of any input or contact with professional statisticians. The lack of statistical input can lead to ignorance about the range of possible methods available, and the problems associated with the different methods. Just as good surgeons make use of the best professional instruments for their work rather than using just a few home made tools, so one should expect scientists to use the best data analysis software at their disposal rather than something they just hacked together. Good analysis requires the expert use of good tools.


next up previous contents
Next: Further reading for this Up: Introduction Previous: Some fundamental concepts   Contents
David Stephenson 2005-09-30