Next: Statistical software Up: Introduction Previous: What exactly is statistics Contents

Some fundamental concepts

Statistical data analysis can be subdivided into descriptive statistics and inferential statistics. Descriptive statistics is concerned with exploring and describing a sample of data, whereas inferential statistics uses statistics from a sample of data to make general statements about the whole population. Note that the word ``data'' is plural and a single element of data is called a ``datum'', so avoid saying things like ``the data has been $\ldots$ ''.

Descriptive statistics is concerned with exploring, visualising, and summarizing data sampled from a population but without fitting any probability models to the data. This kind of Exploratory Data Analysis (EDA) is used to explore sample data in the initial stages of data analysis. Since no probability models are involved, it can not be used to test hypotheses or to make testable out-of-sample predictions about the whole population. Nevertheless, it is a very important preliminary part of analysis that can reveal many interesting features in the sample data.

Inferential statistics is the next stage in data analysis and involves the identification of a suitable probability model. The model is then fitted to the data to obtain an optimal estimation of the model's parameters. The model then undergoes evaluation by testing either predictions or hypotheses of the model. Models based on a unique sample of data can be used to infer generalities about features of the whole population.

Much of climate analysis is still at the descriptive stage, and this often misleads climate researchers into thinking that statistical results are not as testable or as useful as physical ideas. This is not the case and statistical thinking and model-based inference can be exploited to much greater benefit to make sense of the complex climate system.

Next: Statistical software Up: Introduction Previous: What exactly is statistics Contents

David Stephenson 2005-09-30