Statistics/Displaying Data/Histograms

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Histograms[edit | edit source]

It is often useful to look at the distribution of the data, or the frequency with which certain values fall between pre-set bins of specified sizes. The selection of these bins is up to you, but remember that they should be selected in order to illuminate your data, not obfuscate it.

A histogram is similar to a bar chart. However histograms are used for continuous (as opposed to discrete or qualitative) data. The defining property of a histogram is:

The area of each bar is proportional to the frequency.

If each bin has an equal width, then this can be easily done by plotting frequency on the vertical axis. However histograms can also be drawn with unequal bin sizes, for which one can plot frequency density.

To produce a histogram with equal bin sizes:

  • Select a minimum, a maximum, and a bin size. All three of these are up to you. In the histogram data used above the minimum is 1, the maximum is 110, and the bin size is 10.
  • Calculate your bins and how many values fall into each of them. For the histogram data the bins are:
    • 1 ≤ x < 10, 16 values.
    • 10 ≤ x < 20, 4 values.
    • 20 ≤ x < 30, 4 values.
    • 30 ≤ x < 40, 2 values.
    • 40 ≤ x < 50, 2 values.
    • 50 ≤ x < 60, 1 values.
    • 60 ≤ x < 70, 0 values.
    • 70 ≤ x < 80, 0 values.
    • 80 ≤ x < 90, 0 values.
    • 90 ≤ x < 100, 0 value.
    • 100 ≤ x < 110, 0 value.
    • 110 ≤ x < 120, 1 value.
  • Plot the counts you figured out above. Do this using a standard bar plot.


Worked Problem[edit | edit source]

Let's say you are an avid roleplayer who loves to play Mechwarrior, a d6 (6 sided die) based game. You have just purchased a new 6 sided die and would like to see whether it is biased (in combination with you when you roll it).

What We Expect[edit | edit source]

So before we look at what we get from rolling the die, let's look at what we would expect. First, if a die is unbiased it means that the odds of rolling a six are exactly the same as the odds of rolling a 1--there wouldn't be any favoritism towards certain values. Using the standard equation for the arithmetic mean find that μ = 3.5. We would also expect the histogram to be roughly even all of the way across--though it will almost never be perfect simply because we are dealing with an element of random chance.

What We Get[edit | edit source]

Here are the numbers that you collect:

1 5 6 4 1 3 5 5 6 4 1 5 6 6 4 5 1 4 3 6
1 3 6 4 2 4 1 6 4 2 2 4 3 4 1 1 6 3 5 5
4 3 5 3 4 2 2 5 6 5 4 3 5 3 3 1 5 4 4 5
1 2 5 1 6 5 4 3 2 4 2 1 3 3 3 4 6 1 1 3
6 6 1 4 6 6 6 5 3 1 5 6 3 4 5 5 5 2 4 4

Analysis[edit | edit source]

Referring back to what we would expect for an unbiased die, this is pretty close to what we would expect. So let's create a histogram to see if there is any significant difference in the distribution.

The only logical way to divide up dice rolls into bins is by what's showing on the die face:

1 2 3 4 5 6
16 9 17 21 20 17

If we are good at visualizing information, we can simple use a table, such as in the one above, to see what might be happening. Often, however, it is useful to have a visual representation. As the amount of variety of data we want to display increases, the need for graphs instead of a simple table increases.

Looking at the above figure, we clearly see that sides 1, 3, and 6 are almost exactly what we would expect by chance. Sides 4 and 5 are slightly greater, but not too much so, and side 2 is a lot less. This could be the result of chance, or it could represent an actual anomaly in the data and it is something to take note of keep in mind. We'll address this issue again in later chapters.


Frequency Density[edit | edit source]

Another way of drawing a histogram is to work out the Frequency Density.

Frequency Density
The Frequency Density is the frequency divided by the class width.

The advantage of using frequency density in a histogram is that doesn't matter if there isn't an obvious standard width to use. For all the groups, you would work out the frequency divided by the class width for all of the groups.

External Links


Return to Statistics.