Statistics Ground Zero/Parametric and Non-parametric Methods
Parametric and Non-parametric Methods
Before looking at some statistics, we should take note of this important distinction in statistical testing. It becomes crucial when we discuss inference below, but I introduce it here because of the relevance of descriptive statistics.
The terms parametric and non-parametric refer to statistical methods. Parametric methods make assumptions about your data set - specifically about how values are distributed. Non-parametric methods make relatively few assumptions about the data. As a consequence, parametric methods have more information to draw on in reasoning about data than non-parametric methods. If parametric methods are available they are more powerful; non-parametric methods (they are often referred to as conservative) are less powerful.
The assumptions are about the parameters of the dataset (hence the name). These parameters cover the location of the values; the dispersion of the values across the metric; the shape of the frequency distribution of the values, that is to say the central tendency, range, variance, skewness and kurtosis.
It is common to use a Gaussian or normal distribution as a reference point for these parameters and to describe other distributions where they deviate from this.
Before you can analyse your data, you will need to determine if the variables of interest have normally distributed scores, or at least close to normally distributed, and thus whether to use parametric or non-parametric methods.
If necessary you can sometimes transform a variable so that the values are normally distributed but I will not cover this here - this kind of transformation is outside the scope of an emergency guide.
Checking data for normality
You can check whether data are normally distributed using a Q-Q plot.
A Q-Q plot plots the quantiles of one data set against another - usually against a known distribution. For present purposes then you plot your data against a normally distributed variable. If the variables are both normally distributed then the points should converge strongly around the line x=y. You can also check for normality using a Kolmogorov-Smirnov Test. This is a non-parametric test where the null hypothesis is that your data represent a normally distributed random variable and so if the result of this test is not significant you can assume your data are normally distributed.