Statistics/Introduction/Need To Know

From Wikibooks, open books for an open world
Jump to: navigation, search

Statistics


  1. Introduction
    1. What Is Statistics?
    2. Subjects in Modern Statistics
    3. Why Should I Learn Statistics? 0% developed
    4. What Do I Need to Know to Learn Statistics?
  2. Different Types of Data
    1. Primary and Secondary Data
    2. Quantitative and Qualitative Data
  3. Methods of Data Collection
    1. Experiments
    2. Sample Surveys
    3. Observational Studies
  4. Data Analysis
    1. Data Cleaning
    2. Moving Average
  5. Summary Statistics
    1. Measures of center
      1. Mean, Median, and Mode
      2. Geometric Mean
      3. Harmonic Mean
      4. Relationships among Arithmetic, Geometric, and Harmonic Mean
      5. Geometric Median
    2. Measures of dispersion
      1. Range of the Data
      2. Variance and Standard Deviation
      3. Quartiles and Quartile Range
      4. Quantiles
  6. Displaying Data
    1. Bar Charts
    2. Comparative Bar Charts
    3. Histograms
    4. Scatter Plots
    5. Box Plots
    6. Pie Charts
    7. Comparative Pie Charts
    8. Pictograms
    9. Line Graphs
    10. Frequency Polygon
  7. Probability
    1. Introduction to Probability
    2. Bernoulli Trials
    3. Introductory Bayesian Analysis
  8. Distributions
    1. Discrete Distributions
      1. Uniform Distribution
      2. Bernoulli Distribution
      3. Binomial Distribution
      4. Poisson Distribution
      5. Geometric Distribution
      6. Negative Binomial Distribution
      7. Hypergeometric Distribution
    2. Continuous Distributions
      1. Uniform Distribution
      2. Exponential Distribution
      3. Gamma Distribution
      4. Normal Distribution
      5. Chi-Square Distribution
      6. Student-t Distribution
      7. F Distribution
      8. Beta Distribution
      9. Weibull Distribution
  9. Testing Statistical Hypothesis
    1. Purpose of Statistical Tests
    2. Formalism Used
    3. Different Types of Tests
    4. z Test for a Single Mean
    5. z Test for Two Means
    6. t Test for a single mean
    7. t Test for Two Means
    8. paired t Test for comparing Means
    9. One-Way ANOVA F Test
    10. z Test for a Single Proportion
    11. z Test for Two Proportions
    12. Testing whether Proportion A Is Greater than Proportion B in Microsoft Excel
    13. Spearman's Rank Coefficient
    14. Pearson's Product Moment Correlation Coefficient
    15. Chi-Squared Tests
      1. Chi-Squared Test for Multiple Proportions
      2. Chi-Squared Test for Contingency
    16. Approximations of distributions
  10. Point Estimates100% developed  as of 12:07, 28 March 2007 (UTC) (12:07, 28 March 2007 (UTC))
    1. Unbiasedness
    2. Measures of goodness
    3. UMVUE
    4. Completeness
    5. Sufficiency and Minimal Sufficiency
    6. Ancillarity
  11. Practice Problems
    1. Summary Statistics Problems
    2. Data-Display Problems
    3. Distributions Problems
    4. Data-Testing Problems
  12. Numerical Methods
    1. Basic Linear Algebra and Gram-Schmidt Orthogonalization
    2. Unconstrained Optimization
    3. Quantile Regression
    4. Numerical Comparison of Statistical Software
    5. Numerics in Excel
    6. Statistics/Numerical_Methods/Random Number Generation
  13. Multivariate Data Analysis
    1. Principal Component Analysis
    2. Factor Analysis for metrical data
    3. Factor Analysis for ordinal data
    4. Canonical Correlation Analysis
    5. Discriminant Analysis
  14. Analysis of Specific Datasets
    1. Analysis of Tuberculosis
  15. Appendix
    1. Authors
    2. Glossary
    3. Index
    4. Links

edit this box


What Do I Need to Know to Learn Statistics?[edit]

Statistics is a diverse subject and thus the mathematics that are required depend on the kind of statistics we are studying. A strong background in linear algebra is needed for most multivariate statistics, but is not necessary for introductory statistics. A background in Calculus is useful no matter what branch of statistics is being studied, but is not required for most introductory statistics classes.

At a bare minimum the student should have a grasp of basic concepts taught in Algebra and be comfortable with "moving things around" and solving for an unknown. Most of the statistics here will derive from a few basic things that the reader should become acquainted with.

Absolute Value[edit]

|x| \equiv \begin{cases}
			x, & x \ge 0 \\
			-x, & x < 0
		\end{cases}


If the number is zero or positive, then the absolute value of the number is simply the same number. If the number is negative, then take away the negative sign to get the absolute value.

Examples[edit]

  • |42| = 42
  • |-5| = 5
  • |2.21| = 2.21

Factorials[edit]

A factorial is a calculation that gets used a lot in probability. It is defined only for integers greater-than-or-equal-to zero as:


n!  \equiv \begin{cases}
			n \cdot (n-1)!, & n \ge 1 \\
			1,  & n = 0
		\end{cases}

Examples[edit]

In short, this means that:

0! = 1 = 1
1! = 1 · 1 = 1
2! = 2 · 1 = 2
3! = 3 · 2 · 1 = 6
4! = 4 · 3 · 2 · 1 = 24
5! = 5 · 4 · 3 · 2 · 1 = 120
6! = 6 · 5 · 4 · 3 · 2 · 1 = 720

Summation[edit]

The summation (also known as a series) is used more than almost any other technique in statistics. It is a method of representing addition over lots of values without putting + after +. We represent summation using a big uppercase sigma: ∑.

Examples[edit]

Very often in statistics we will sum a list of related variables:


	\sum_{i=0}^n x_i = x_0 + x_1 + x_2 + \cdots + x_n

Here we are adding all the x variables (which will hopefully all have values by the time we calculate this). The expression below the ∑ (i=0, in this case) represents the index variable and what its starting value is (i with a starting value of 0) while the number above the ∑ represents the number that the variable will increment to (stepping by 1, so i = 0, 1, 2, 3, and then 4). Another example:


\sum_{i=1}^4 2i = 2(1) + 2(2) + 2(3) + 2(4) = 2 + 4 + 6 + 8 = 20

Notice that we would get the same value by moving the 2 outside of the summation (perform the summation and then multiply by 2, rather than multiplying each component of the summation by 2).

Infinite series[edit]

There is no reason, of course, that a series has to count on any determined, or even finite value—it can keep going without end. These series are called "infinite series" and sometimes they can even converge to a finite value, eventually becoming equal to that value as the number of items in your series approaches infinity (∞).

Examples[edit]

\sum_{k=0}^\infty r^k = \frac{1}{1-r}, \left| r \right| < 1

This example is the famous geometric series. Note both that the series goes to ∞ (infinity, that means it does not stop) and that it is only valid for certain values of the variable r. This means that if r is between the values of -1 and 1 (-1 < r < 1) then the summation will get closer to (i.e., converge on) 1 / 1-r the further you take the series out.

Linear Approximation[edit]

Student-t Distribution at various critical values with varying degrees of freedom.
v / α 0.20 0.10 0.05 0.025 0.01 0.005
40 0.85070 1.30308 1.68385 2.02108 2.42326 2.70446
50 0.84887 1.29871 1.67591 2.00856 2.40327 2.67779
60 0.84765 1.29582 1.67065 2.00030 2.39012 2.66028
70 0.84679 1.29376 1.66691 1.99444 2.38081 2.64790
80 0.84614 1.29222 1.66412 1.99006 2.37387 2.63869
90 0.84563 1.29103 1.66196 1.98667 2.36850 2.63157
100 0.84523 1.29007 1.66023 1.98397 2.36422 2.62589


Let us say that you are looking at a table of values, such as the one above. You want to approximate (get a good estimate of) the values at 63, but you do not have those values on your table. A good solution here is use a linear approximation to get a value which is probably close to the one that you really want, without having to go through all of the trouble of calculating the extra step in the table.


f\left(x_i\right) \approx \frac{f\left(x_{\lceil i \rceil}\right) - f\left(x_{\lfloor i \rfloor}\right)}{x_{\lceil i \rceil} - x_{\lfloor i \rfloor}} \cdot \left(x_i - x_{\lfloor i \rfloor}\right) + f\left(x_{\lfloor i \rfloor}\right)

This is just the equation for a line applied to the table of data. xi represents the data point you want to know about, x_{\lfloor i \rfloor} is the known data point beneath the one you want to know about, and x_{\lceil i \rceil} is the known data point above the one you want to know about.

Examples[edit]

Find the value at 63 for the 0.05 column, using the values on the table above.

First we confirm on the above table that we need to approximate the value. If we know it exactly, then there really is no need to approximate it. As it stands this is going to rest on the table somewhere between 60 and 70. Everything else we can get from the table:

 


f(63) \approx  \frac{f(70) - f(60)}{70 - 60} \cdot (63 - 60) + f(60) = \frac{1.66691  - 1.67065}{10} \cdot 3 + 1.67065 = 1.669528

 

Using software, we calculate the actual value of f(63) to be 1.669402, a difference of around 0.00013. Close enough for our purposes.