Statistics/Introduction/What is Statistics

From Wikibooks, open books for an open world
Jump to navigation Jump to search


  1. Introduction
    1. What Is Statistics?
    2. Subjects in Modern Statistics
    3. Why Should I Learn Statistics? 0% developed
    4. What Do I Need to Know to Learn Statistics?
  2. Different Types of Data
    1. Primary and Secondary Data
    2. Quantitative and Qualitative Data
  3. Methods of Data Collection
    1. Experiments
    2. Sample Surveys
    3. Observational Studies
  4. Data Analysis
    1. Data Cleaning
    2. Moving Average
  5. Summary Statistics
    1. Measures of center
      1. Mean, Median, and Mode
      2. Geometric Mean
      3. Harmonic Mean
      4. Relationships among Arithmetic, Geometric, and Harmonic Mean
      5. Geometric Median
    2. Measures of dispersion
      1. Range of the Data
      2. Variance and Standard Deviation
      3. Quartiles and Quartile Range
      4. Quantiles
  6. Displaying Data
    1. Bar Charts
    2. Comparative Bar Charts
    3. Histograms
    4. Scatter Plots
    5. Box Plots
    6. Pie Charts
    7. Comparative Pie Charts
    8. Pictograms
    9. Line Graphs
    10. Frequency Polygon
  7. Probability
    1. Combinatorics
    2. Bernoulli Trials
    3. Introductory Bayesian Analysis
  8. Distributions
    1. Discrete Distributions
      1. Uniform Distribution
      2. Bernoulli Distribution
      3. Binomial Distribution
      4. Poisson Distribution
      5. Geometric Distribution
      6. Negative Binomial Distribution
      7. Hypergeometric Distribution
    2. Continuous Distributions
      1. Uniform Distribution
      2. Exponential Distribution
      3. Gamma Distribution
      4. Normal Distribution
      5. Chi-Square Distribution
      6. Student-t Distribution
      7. F Distribution
      8. Beta Distribution
      9. Weibull Distribution
  9. Testing Statistical Hypothesis
    1. Purpose of Statistical Tests
    2. Formalism Used
    3. Different Types of Tests
    4. z Test for a Single Mean
    5. z Test for Two Means
    6. t Test for a single mean
    7. t Test for Two Means
    8. paired t Test for comparing Means
    9. One-Way ANOVA F Test
    10. z Test for a Single Proportion
    11. z Test for Two Proportions
    12. Testing whether Proportion A Is Greater than Proportion B in Microsoft Excel
    13. Spearman's Rank Coefficient
    14. Pearson's Product Moment Correlation Coefficient
    15. Chi-Squared Tests
      1. Chi-Squared Test for Multiple Proportions
      2. Chi-Squared Test for Contingency
    16. Approximations of distributions
  10. Point Estimates100% developed  as of 12:07, 28 March 2007 (UTC) (12:07, 28 March 2007 (UTC))
    1. Unbiasedness
    2. Measures of goodness
    3. UMVUE
    4. Completeness
    5. Sufficiency and Minimal Sufficiency
    6. Ancillarity
  11. Practice Problems
    1. Summary Statistics Problems
    2. Data-Display Problems
    3. Distributions Problems
    4. Data-Testing Problems
  12. Numerical Methods
    1. Basic Linear Algebra and Gram-Schmidt Orthogonalization
    2. Unconstrained Optimization
    3. Quantile Regression
    4. Numerical Comparison of Statistical Software
    5. Numerics in Excel
    6. Statistics/Numerical_Methods/Random Number Generation
  13. Multivariate Data Analysis
    1. Principal Component Analysis
    2. Factor Analysis for metrical data
    3. Factor Analysis for ordinal data
    4. Canonical Correlation Analysis
    5. Discriminant Analysis
  14. Analysis of Specific Datasets
    1. Analysis of Tuberculosis
  15. Appendix
    1. Authors
    2. Glossary
    3. Index
    4. Links

edit this box

Your company has created a new drug that may cure arthritis. How would you conduct a test to confirm the drug's effectiveness?

The latest sales data have just come in, and your boss wants you to prepare a report for management on places where the company could improve its business. What should you look for? What should you not look for?

You and a friend are at a baseball game, and out of the blue he offers you a bet that neither team will hit a home run in that game. Should you take the bet?

You want to conduct a poll on whether your school should use its funding to build a new athletic complex or a new library. How many people do you have to poll? How do you ensure that your poll is free of bias? How do you interpret your results?

A widget maker in your factory that normally breaks 4 widgets for every 100 it produces has recently started breaking 5 widgets for every 100. When is it time to buy a new widget maker? (And just what is a widget, anyway?)

These are some of the many real-world examples that require the use of statistics. How would you approach the problem statements? There are some stepwise human algorithms, but is there a general problem statement?

  • "Find possible solutions, decide on a solution, plan the solution, implement the solution, learn from the results for future solutions (or re-solution)."
  • "SOAP - subjective - the problem as given, objective - the problem after examination, assessment - the better defined problem, plan - decide if guidelines to management already exist, and blueprint the solution for this case, or generate a risk-minimizing, new solution path".
  • "HAMRC - hypothesis, aim, methodology, results, conclusion" - the concept that there is no real difference is the null hypothesis.

Then there is the joke that compares the different ways of thinking:

"A physicist, a chemist and a statistician were working collaboratively on a problem, when the wastepaper basket spontaneously combusted (they all swore they had stopped smoking). The chemist said, 'quick, we must reduce the concentration of the reactant which is oxygen, by increasing the relative concentration of non-reactive gases, such as carbon dioxide and carbon monoxide. Place a fire blanket over the flames.' The physicist, interjected, 'no, no, we must reduce the heat energy available for activating combustion ; get some water to douse the flame'. Meanwhile, the statistician was running around lighting more fires. The others asked with alarm, 'what are you doing?'. 'Trying to get an adequate sample size'."

General Definition[edit | edit source]

Statistics, in short, is the study of data. It includes descriptive statistics (the study of methods and tools for collecting data, and mathematical models to describe and interpret data) and inferential statistics (the systems and techniques for making probability-based decisions and accurate predictions).

Etymology[edit | edit source]

As its name implies, statistics has its roots in the idea of "the state of things". The word itself comes from the ancient Latin term statisticum collegium, meaning "a lecture on the state of affairs". Eventually, this evolved into the Italian word statista, meaning "statesman", and the German word Statistik, meaning "collection of data involving the State". Gradually, the term came to be used to describe the collection of any sort of data.

Statistics as a subset of mathematics[edit | edit source]

As one would expect, statistics is largely grounded in mathematics, and the study of statistics has lent itself to many major concepts in mathematics: probability, distributions, samples and populations, the bell curve, estimation, and data analysis.

Up ahead[edit | edit source]

Up ahead, we will learn about subjects in modern statistics and some practical applications of statistics. We will also lay out some of the background mathematical concepts required to begin studying statistics.