From Wikibooks, open books for an open world
< Statistics‎ | Distributions
Jump to: navigation, search


  1. Introduction
    1. What Is Statistics?
    2. Subjects in Modern Statistics
    3. Why Should I Learn Statistics? 0% developed
    4. What Do I Need to Know to Learn Statistics?
  2. Different Types of Data
    1. Primary and Secondary Data
    2. Quantitative and Qualitative Data
  3. Methods of Data Collection
    1. Experiments
    2. Sample Surveys
    3. Observational Studies
  4. Data Analysis
    1. Data Cleaning
    2. Moving Average
  5. Summary Statistics
    1. Measures of center
      1. Mean, Median, and Mode
      2. Geometric Mean
      3. Harmonic Mean
      4. Relationships among Arithmetic, Geometric, and Harmonic Mean
      5. Geometric Median
    2. Measures of dispersion
      1. Range of the Data
      2. Variance and Standard Deviation
      3. Quartiles and Quartile Range
      4. Quantiles
  6. Displaying Data
    1. Bar Charts
    2. Comparative Bar Charts
    3. Histograms
    4. Scatter Plots
    5. Box Plots
    6. Pie Charts
    7. Comparative Pie Charts
    8. Pictograms
    9. Line Graphs
    10. Frequency Polygon
  7. Probability
    1. Combinatorics
    2. Bernoulli Trials
    3. Introductory Bayesian Analysis
  8. Distributions
    1. Discrete Distributions
      1. Uniform Distribution
      2. Bernoulli Distribution
      3. Binomial Distribution
      4. Poisson Distribution
      5. Geometric Distribution
      6. Negative Binomial Distribution
      7. Hypergeometric Distribution
    2. Continuous Distributions
      1. Uniform Distribution
      2. Exponential Distribution
      3. Gamma Distribution
      4. Normal Distribution
      5. Chi-Square Distribution
      6. Student-t Distribution
      7. F Distribution
      8. Beta Distribution
      9. Weibull Distribution
  9. Testing Statistical Hypothesis
    1. Purpose of Statistical Tests
    2. Formalism Used
    3. Different Types of Tests
    4. z Test for a Single Mean
    5. z Test for Two Means
    6. t Test for a single mean
    7. t Test for Two Means
    8. paired t Test for comparing Means
    9. One-Way ANOVA F Test
    10. z Test for a Single Proportion
    11. z Test for Two Proportions
    12. Testing whether Proportion A Is Greater than Proportion B in Microsoft Excel
    13. Spearman's Rank Coefficient
    14. Pearson's Product Moment Correlation Coefficient
    15. Chi-Squared Tests
      1. Chi-Squared Test for Multiple Proportions
      2. Chi-Squared Test for Contingency
    16. Approximations of distributions
  10. Point Estimates100% developed  as of 12:07, 28 March 2007 (UTC) (12:07, 28 March 2007 (UTC))
    1. Unbiasedness
    2. Measures of goodness
    3. UMVUE
    4. Completeness
    5. Sufficiency and Minimal Sufficiency
    6. Ancillarity
  11. Practice Problems
    1. Summary Statistics Problems
    2. Data-Display Problems
    3. Distributions Problems
    4. Data-Testing Problems
  12. Numerical Methods
    1. Basic Linear Algebra and Gram-Schmidt Orthogonalization
    2. Unconstrained Optimization
    3. Quantile Regression
    4. Numerical Comparison of Statistical Software
    5. Numerics in Excel
    6. Statistics/Numerical_Methods/Random Number Generation
  13. Multivariate Data Analysis
    1. Principal Component Analysis
    2. Factor Analysis for metrical data
    3. Factor Analysis for ordinal data
    4. Canonical Correlation Analysis
    5. Discriminant Analysis
  14. Analysis of Specific Datasets
    1. Analysis of Tuberculosis
  15. Appendix
    1. Authors
    2. Glossary
    3. Index
    4. Links

edit this box

Where the Bernoulli Distribution asks the question of "Will this single event succeed?" the Binomial is associated with the question "Out of a given number of trials, how many will succeed?" Some example questions that are modeled with a Binomial distribution are:

  • Out of ten tosses, how many times will this coin land heads?
  • From the children born in a given hospital on a given day, how many of them will be girls?
  • How many students in a given classroom will have green eyes?
  • How many mosquitos, out of a swarm, will die when sprayed with insecticide?

The relation between the Bernoulli and binomial distributions is intuitive: The binomial distribution is composed of multiple Bernoulli trials. We conduct n repeated experiments where the probability of success is given by the parameter p and add up the number of successes. This number of successes is represented by the random variable X. The value of X is then between 0 and n.

When a random variable X has a binomial distribution with parameters p and n we write it as X ~ Bin(n,p) or X ~ B(n,p) and the probability mass function is given by the equation:

 P\left[X = k\right] = \begin{cases} {n \choose k} p^k \left(1-p\right)^{n-k}\ & 0 \le k \le n \\ 0 & \mbox{otherwise} \end{cases} \quad 0 \leq p \leq 1, \quad n \in \mathbb{N}

where {n \choose k}={n! \over k!(n-k)!}

For a refresher on factorials (n!), go back to the Refresher Course earlier in this wiki book.

An example[edit]

Let's walk through a simple example of the binomial distribution. We're going to use some pretty small numbers because factorials can be hard to compute. We are going to ask five random people if they believe there is life on other planets. We are going to assume in this example that we know 30% of people believe this to be true. We want to ask the question: "How many people will say they believe in extraterrestrial life?" Actually, we want to be more specific than that: "What is the probability that exactly 2 people will say they believe in extraterrestrial life?"

We know all the values that we need to plug into the equation. The number of people asked, n=5. The probability of any given person answering "yes", p=0.3. (Remember, I said that 30% of people believe in life on other planets!) Finally, we're asking for the probability that exactly 2 people answer "yes" so k=2. This yields the equation:

 P \left[X = 2 \right] = {5 \choose 2} \cdot {{0.3^2 \cdot} {\left( 1 - 0.3 \right)^{3}}} = {10} \cdot {{0.3^2} \cdot {\left( 1-0.3 \right)^{3}}} = 0.3087 since {5 \choose 2}={5! \over 2! \cdot 3!}={5 \cdot 4 \cdot 3 \cdot 2 \cdot 1 \over (2 \cdot 1) \cdot (3 \cdot 2 \cdot 1)}={120 \over 12}=10

Here are the probabilities for all the possible values of X. You can get these values by replacing the k=2 in the above equation with all values from 0 to 5.

Value for k Probability f(k)
0 0.16807
1 0.36015
2 0.30870
3 0.13230
4 0.02835
5 0.00243

What can we learn from these results? Well, first of all we'll see that it's just a little more likely that only one person will confess to believing in life on other planets. There's a distinct chance (about 17%) that nobody will believe it, and there's only a 0.24% (a little over 2 in 1000) that all five people will be believers.

Explanation of the equation[edit]

Take the above example. Let's consider each of the five people one by one.

The probability that any one person believes in extraterrestrial life is 30%, or 0.3. So the probability that any two people both believe in extraterrestrial life is 0.3 squared. Similarly, the probability that any one person does not believe in extraterrestrial life is 70%, or 0.7, so the probability that any three people do not believe in extraterrestrial life is 0.7 cubed.

Now, for two out of five people to believe in extraterrestrial life, two conditions must be satisfied: two people believe in extraterrestrial life, and three do not. The probability of two out of five people believing in extraterrestrial life would thus appear to be 0.3 squared (two believers) times 0.7 cubed (three non-believers), or 0.03087.

However, in doing this, we are only considering the case whereby the first two selected people are believers. How do we consider cases such as that in which the third and fifth people are believers, which would also mean a total of two believers out of five?

The answer lies in combinatorics. Bearing in mind that the probability that the first two out of five people believe in extraterrestrial life is 0.03087, we note that there are C(5,2), or 10, ways of selecting a set of two people from out of a set of five, i.e. there are ten ways of considering two people out of the five to be the "first two". This is why we multiply by C(n,k). The probability of having any two of the five people be believers is ten times 0.03087, or 0.3087.


The mean can be derived as follow.

\operatorname{E}[X] = \sum_i f(x_i) \cdot x_i = \sum_{x=0}^n {n \choose x} p^x \left(1-p\right)^{n-x} \cdot x
\operatorname{E}[X] = \sum_{x=0}^n {n! \over x!(n-x)!} p^x \left(1-p\right)^{n-x} x
\operatorname{E}[X] = {n! \over 0!(n-0)!} p^0 \left(1-p\right)^{n-0} \cdot 0 + \sum_{x=1}^n {n! \over x!(n-x)!} p^x \left(1-p\right)^{n-x} x
\operatorname{E}[X] = 0 + \sum_{x=1}^n {n(n-1)! \over x(x-1)!(n-x)!} p \cdot p^{x-1} \left(1-p\right)^{n-x} x
\operatorname{E}[X] = np\sum_{x=1}^n {(n-1)! \over (x-1)!(n-x)!} p^{x-1} \left(1-p\right)^{n-x}

Now let w=x-1 and m=n-1. We see that m-w=n-x. We can now rewrite the summation as

\operatorname{E}[X] = np \left[\sum_{w=0}^m {m! \over w!(m-w)!} p^{w} \left(1-p\right)^{m-w}\right]

We now see that the summation is the sum over the complete pmf of a binomial random variable distributed Bin(m, p). This is equal to 1 (and can be easily verified using the Binomial theorem). Therefore, we have

\operatorname{E}[X] = np \left[1\right]=np


We derive the variance using the following formula:

\operatorname{Var}[X] = \operatorname{E}[X^2] - (\operatorname{E}[X])^2.

We have already calculated E[X] above, so now we will calculate E[X2] and then return to this variance formula:

\operatorname{E}[X^2] = \sum_i f(x_i) \cdot x^2

= \sum_{x=0}^n x^2 \cdot {n\choose x}p^x(1-p)^{n-x}.

We can use our experience gained above in deriving the mean. We use the same definitions of m and w.

\operatorname{E}[X^2] = \sum_{x=0}^n {n! \over x!(n-x)!} p^x \left(1-p\right)^{n-x} x^2
\operatorname{E}[X^2] = 0 + \sum_{x=1}^n {n! \over x!(n-x)!} p^x \left(1-p\right)^{n-x} x^2
\operatorname{E}[X^2] = np\sum_{x=1}^n {(n-1)! \over (x-1)!(n-x)!} p^{x-1} \left(1-p\right)^{n-x}x
\operatorname{E}[X^2] = np \sum_{w=0}^m {m\choose w} p^{w} \left(1-p\right)^{m-w}(w+1)
\operatorname{E}[X^2] = np \left[\sum_{w=0}^m {m\choose w} p^{w} \left(1-p\right)^{m-w}w+\sum_{w=0}^m {m\choose w} p^{w} \left(1-p\right)^{m-w} \right]

The first sum is identical in form to the one we calculated in the Mean (above). It sums to mp. The second sum is 1.

\operatorname{E}[X^2] = np \cdot ( mp + 1) = np((n-1)p + 1) = np(np - p + 1).

Using this result in the expression for the variance, along with the Mean (E(X) = np), we get

\operatorname{Var}(X) = \operatorname{E}[X^2] - (\operatorname{E}[X])^2 = np(np - p + 1) - (np)^2 = np(1-p).

External links[edit]