Statistics/Probability

From Wikibooks, open books for an open world
< Statistics
Jump to: navigation, search

Statistics


  1. Introduction
    1. What Is Statistics?
    2. Subjects in Modern Statistics
    3. Why Should I Learn Statistics? 0% developed
    4. What Do I Need to Know to Learn Statistics?
  2. Different Types of Data
    1. Primary and Secondary Data
    2. Quantitative and Qualitative Data
  3. Methods of Data Collection
    1. Experiments
    2. Sample Surveys
    3. Observational Studies
  4. Data Analysis
    1. Data Cleaning
    2. Moving Average
  5. Summary Statistics
    1. Measures of center
      1. Mean, Median, and Mode
      2. Geometric Mean
      3. Harmonic Mean
      4. Relationships among Arithmetic, Geometric, and Harmonic Mean
      5. Geometric Median
    2. Measures of dispersion
      1. Range of the Data
      2. Variance and Standard Deviation
      3. Quartiles and Quartile Range
      4. Quantiles
  6. Displaying Data
    1. Bar Charts
    2. Comparative Bar Charts
    3. Histograms
    4. Scatter Plots
    5. Box Plots
    6. Pie Charts
    7. Comparative Pie Charts
    8. Pictograms
    9. Line Graphs
    10. Frequency Polygon
  7. Probability
    1. Introduction to Probability
    2. Bernoulli Trials
    3. Introductory Bayesian Analysis
  8. Distributions
    1. Discrete Distributions
      1. Uniform Distribution
      2. Bernoulli Distribution
      3. Binomial Distribution
      4. Poisson Distribution
      5. Geometric Distribution
      6. Negative Binomial Distribution
      7. Hypergeometric Distribution
    2. Continuous Distributions
      1. Uniform Distribution
      2. Exponential Distribution
      3. Gamma Distribution
      4. Normal Distribution
      5. Chi-Square Distribution
      6. Student-t Distribution
      7. F Distribution
      8. Beta Distribution
      9. Weibull Distribution
  9. Testing Statistical Hypothesis
    1. Purpose of Statistical Tests
    2. Formalism Used
    3. Different Types of Tests
    4. z Test for a Single Mean
    5. z Test for Two Means
    6. t Test for a single mean
    7. t Test for Two Means
    8. paired t Test for comparing Means
    9. One-Way ANOVA F Test
    10. z Test for a Single Proportion
    11. z Test for Two Proportions
    12. Testing whether Proportion A Is Greater than Proportion B in Microsoft Excel
    13. Spearman's Rank Coefficient
    14. Pearson's Product Moment Correlation Coefficient
    15. Chi-Squared Tests
      1. Chi-Squared Test for Multiple Proportions
      2. Chi-Squared Test for Contingency
    16. Approximations of distributions
  10. Point Estimates100% developed  as of 12:07, 28 March 2007 (UTC) (12:07, 28 March 2007 (UTC))
    1. Unbiasedness
    2. Measures of goodness
    3. UMVUE
    4. Completeness
    5. Sufficiency and Minimal Sufficiency
    6. Ancillarity
  11. Practice Problems
    1. Summary Statistics Problems
    2. Data-Display Problems
    3. Distributions Problems
    4. Data-Testing Problems
  12. Numerical Methods
    1. Basic Linear Algebra and Gram-Schmidt Orthogonalization
    2. Unconstrained Optimization
    3. Quantile Regression
    4. Numerical Comparison of Statistical Software
    5. Numerics in Excel
    6. Statistics/Numerical_Methods/Random Number Generation
  13. Multivariate Data Analysis
    1. Principal Component Analysis
    2. Factor Analysis for metrical data
    3. Factor Analysis for ordinal data
    4. Canonical Correlation Analysis
    5. Discriminant Analysis
  14. Analysis of Specific Datasets
    1. Analysis of Tuberculosis
  15. Appendix
    1. Authors
    2. Glossary
    3. Index
    4. Links

edit this box

When throwing two dice, what is the probability that their sum equals seven?

Introduction to probability[edit]

Please note that this page is just a stub, more will be added later.

Why have probability in a statistics textbook?[edit]

Very little in mathematics is truly self contained. Many branches of mathematics touch and interact with one another, and the fields of probability and statistics are no different. A basic understanding of probability is vital in grasping basic statistics, and probability is largely abstract without statistics to determine the "real world" probabilities.

This section is not meant to give a comprehensive lecture in probability, but rather simply touch on the basics that are needed for this class, covering the basics of Bayesian Analysis for those students who are looking for something a little more interesting. This knowledge will be invaluable in attempting to understand the mathematics involved in various Distributions that come later.

Set notion[edit]

A set is a collection of objects. We usually use capital letters to denote sets, for e.g., A is the set of females in this room.

• The members of a set A are called the elements of A. For e.g., Patricia is an element of A (Patricia ∈ A) Patrick is not an element of A (Patrick ∉ A).

• The universal set, U, is the set of all objects under consideration. For e.g., U is the set of all people in this room.

• The null set or empty set, ∅, has no elements. For e.g., the set of males above 2.8m tall in this room is an empty set.

• The complement Ac of a set A is the set of elements in U outside A. I.e. x ∈ Ac iff x ∉ A.

• Let A and B be 2 sets. A is a subset of B if each element of A is also an element of B. Write A ⊂ B. For e.g., The set of females wearing metal frame glasses in this room ⊂ the set of females wearing glasses in this room ⊂ the set of females in this room.

• The intersection A ∩ B of two sets A and B is the set of the common elements. I.e. x ∈ A ∩ B iff x ∈ A and x ∈ B.

• The union A ∪ B of two sets A and B is the set of all elements from A or B. I.e. x ∈ A ∪ B iff x ∈ A or x ∈ B.

Venn diagrams and notation[edit]

A Venn diagram visually models defined events. Each event is expressed with a circle. Events that have outcomes in common will overlap with what is known as the intersection of the events.

A Venn diagram.


Probability[edit]

Probability is connected with some unpredictability. We know what outcomes may occur, but not exactly which one. The set of possible outcomes plays a basic role. We call it the sample space and indicate it by S. Elements of S are called outcomes. In rolling a dice the sample space is S = {1,2,3,4,5,6}. Not only do we speak of the outcomes, but also about events, sets of outcomes. E.g. in rolling a dice we can ask whether the outcome was an even number, which means asking after the event "even" = E = {2,4,6}. In simple situations with a finite number of outcomes, we assign to each outcome s (∈ S) its probability (of occurrence) p(s) (written with a small p), a number between 0 and 1. It is a quite simple function, called the probability function, with the only further property that the total of all the probabilities sum up to 1. Also for events A do we speak of their probability P(A) (written with a capital P), which is simply the total of the probabilities of the outcomes in A. For a fair dice p(s) = 1/6 for each outcome s and P("even") = P(E) = 1/6+1/6+1/6 = 1/2.

The general concept of probability for non-finite sample spaces is a little more complex, although it rests on the same ideas.

Negation[edit]

Negation is a way of saying "not A", hence saying that the complement of A has occurred. Note: The complement of an event A can be expressed as A' or Ac
For example: "What is the probability that a six-sided die will not land on a one?" (five out of six, or p = 0.833)


P[ X' ] = 1 - P[ X ]
Complement of an Event

Or, more colloquially, "the probability of 'not X' together with the probability of 'X' equals one or 100%."

Calculating Probability[edit]

Relative frequency describes the number of successes over the total number of outcomes. For example if a coin is flipped and out of 50 flips 29 are heads then the relative frequency is

\frac{29}{50}


The Union of two events is when you want to know Event A OR Event B.
This is different than "And." "And" is the intersection, "OR" is the union of the events (both events put together).

Union example.jpg
In the above example of events you will notice that...

Event A is a STAR and a DIAMOND.

Event B is a TRIANGLE and a PENTAGON and a STAR
(A ∩ B) = (A and B) = A intersect B is only the STAR
But (A ∪ B) = (A or B) = A Union B is EVERYTHING. The TRIANGLE, PENTAGON, STAR, and DIAMOND
Notice that both event A and Event B have the STAR in common. However, when you list the Union of the events you only list the STAR one time!
Event A = STAR, DIAMOND EVENT B = TRIANGLE, PENTAGON, STAR
When you combine them together you get (STAR + DIAMOND) + (TRIANGLE + PENTAGON + STAR) BUT WAIT!!! STAR is listed two times, so one will need to SUBTRACT the extra STAR from the list.
You should notice that it is the INTERSECTION that is listed TWICE, so you have to subtract the duplicate intersection.

Formula for the Union of Events: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

Example:
Let P(A) = 0.3 and P(B) = 0.2 and P(A ∩ B) = 0.15. Find P(A ∪ B).
P(A ∪ B) = (0.3) + (0.2) - (0.15) = 0.35

Example:
Let P(A) = 0.3 and P(B) = 0.2 and P(A ∩ B) = 0. Find P(A ∪ B).
Note: Since the intersection of the events is the null set, then you know the events are DISJOINT or MUTUALLY EXCLUSIVE.
P(A ∪ B) = (0.3) + (0.2) - (0) = 0.5

Conjunction[edit]

Disjunction[edit]

Law of total probability[edit]

Generalized case[edit]

Conclusion: putting it all together[edit]

Examples[edit]