A-level Geography/AS OCR Geography/Investigation Paper
This is a concept to find the statistical significance of the correlation between the two variables. First, a null hypothesis needs to be made.
Mann-Whitney U Test
Mann-Whitney U test is a test for difference between 2 data sets. Using a critical values table, the level of confidence in the relationship can be established, and the null hypotheses can be accepted or rejected. There are both advantages and disadvantages which accompany this statistical test for analysis of statistical data. One advantage is that it can compare two data sets that are different sizes. This makes the test much more versatile, and can be applied to a range of different data sets. Secondly, the test is not based on observed values. This means that there are no assumptions made about the distribution of the data. This is particularly useful in geography because most of the data we collect will be either positively or negatively skewed. Finally, Mann-Whitney U uses non-parametric (non-grouped) data. This is an advantage because the trends in the data cannot be generalised, and thus producing a more statistically sound result.
It does however carry some disadvantages with it, one being the fact that it cannot be applied to more than 2 data sets at one time. This is because it is non-parametric data. If the test was done using parametric (grouped) data, you would be able to compare multiple data sets at once. An example of this is the student’s “t” test. This is because parametric data provides a result in proportion to the data, and can therefore be contrasted with many data sets. Also, Mann-Whitney becomes increasingly less effective as the data sets get larger. This reduces its effectiveness because the calculation becomes too long-winded, and it takes a very long time to complete. It also reduces the precision of the result, as with a bigger sample size there is more margin for error.
In conclusion, despite the obvious flaws with the test, it is a very effective way of looking at the differences between 2 pairs of data sets.
Five Sections of Investigation
Pragmatic = essentially safety and accessibility. For example, if a student is carrying out a river investigation they might use pragmatic sampling methods meaning only areas that were easily accessible and did not pose a risk would be studied. It is reliable and practical.
Random = Not as one might assume randomly selecting a site or throwing a quadrat, in order to use random sample methods either a calculator, grid or computer is needed to generate random statistics that have not been influenced by human decision.
Systematic = This type of technique would be used if progressional change over distance or time was being studied. A transect would be measured and data recorded at regular intervals along said transect, so that change over time or distance could be observed. For example, if a student wished to study psammosere succession, a transect might be measured from the sea to the climax environment (woodland) and at every 25m or so measurements would be taken.
Stratified = To be completed by Magneto and River's landmass.
Most = Mode The most frequent Sample number. This is the sample figure which occurs the most times e.g.7,8,9,5,4,3,5,6,7,5,5,5,5,3,5, 5=mode Comes from the French word "la mode" for fashion
This is the middle sample when all the samples are placed in arithmetic Order. e.g. 1,2,3,4,5,6,7,8,9 5=median
The area which the samples stretch from.
Statistics: Standard Deviation
To describe the data regarding an infiltration rate statistically I would use the mean and the standard deviation as a measure of central tendency and dispersion. These two are always used together.
The mean is simply the sum of all values of x (infiltration rate) divided by n (total number of samples)
The standard deviation is more complicated and requires the use of the formula; where x is the individual infiltration rate and
X bar is the mean of x’s (as above)
This is simply done by listing the value of x on a table, and subtracting x bar from each. In the next column, square the subsequent result.
Then add up all the values of (x-xbar) ² (Σ (x-xbar) ². Divide this by n (100 IN THIS CASE), and then square root the answer.
These methods are the most powerful and sensitive, because they include all the values of all the data. They do not exclude any data. However, the data must be (nearly) normally distributed to use them.
Mode and range mealy state the most numerous value of x and the difference between the smallest and largest which is not very useful, as it ignores most of the data. Interquartile deviation and medium rely on the rank order of the data row do not engage with the size of the values for x, and ignore the extremes of the data set.