next up previous contents
Next: Linear regression Up: Basic Linear Regression Previous: Basic Linear Regression   Contents

A few words on modelling strategy ...

Models provide compact ways of summarising observed relationships and are essential for making predictions and inferences. Models can be considered to be maps (representations) of reality and like any map can not possibly describe everything in reality (nor do they have to !). This is nicely summarised in the famous quotation:

``All models are wrong, but some are useful'' - G.E.P. Box

Good modellers are aware of their models' strengths and weaknesses and use the models appropriately. The process of choosing the most appropriate models is very complex and involves the following stages:

  1. Identification
    By analysing the data critically using descriptive techniques and thinking about underlying processes, a class of potentially suitable models can be identified for further investigation. It is wise in this stage to consider first the simplest possible models (e.g. linear with few parameters) that may be appropriate before progressing to more complex models (e.g. neural networks).

  2. Estimation
    By fitting the model to the sample data, the model parameters and their confidence intervals are estimated by usually using either least-squares or maximum likelihood methods. Estimation is generally easier and more precise for parsimonius models that have the least number of parameters.

  3. Evaluation
    The model fit is critically assessed by carefully analysing the residuals (errors) of the fit and other diagnostics. This is sometimes referred to as validation and/or verification by the atmospheric science community.

  4. Prediction
    The model is used to make predictions in new situations (i.e. independent data to that used in making the fit). The model predictions are then verified to test whether the model has any real skill. Predictive skill is the ultimate test of any model. There is no guarantee that a model which provides a good fit, will also produce good predictions. For example, non-parsimonious models having many parameters that provide excellent fits to the original data often fail to give good predictions when applied to new data (over-fitting).

By iterating at any stage in this process, it is possible with much skill and patience to find the most appropriate models.


next up previous contents
Next: Linear regression Up: Basic Linear Regression Previous: Basic Linear Regression   Contents
David Stephenson 2005-09-30