Probability/Introduction

From Wikibooks, open books for an open world
Jump to: navigation, search

Overview[edit]

Probability theory provides a mathematical model for the study of randomness and uncertainty. Many important decisions, whether from business, government, science, recreation or even one's personal life must be made with incomplete information or some degree of uncertainty. Hence, a formalized study of uncertain or random outcomes occupies an important role in modern society. In situations where one of any number of possible outcomes may occur, the mathematical model of probability theory offers methods for quantifying the likelihoods associated with those outcomes. Probability also provides tools which allow us to move beyond simply describing the information contained within a set of data (descriptive statistics) to actually inferring further information from that data (inferential statistics). Many of the early attempts to model likelihood arose from games of chance. For a brief history of probability see this Wikipedia article.

Although probability theory is now a very formal branch of mathematics, the language of probability is often used informally in everyday speech. We express our beliefs about likelihoods of outcomes using intuition guided by our experiences and in some cases statistics. Consider the following examples:

  • Bill says "Don't buy the avocados here; about half the time, they're rotten". Bill is expressing his belief about the probability of an event — that an avocado will be rotten — based on his personal experience.
  • Lisa says "I am 95% certain the capital of Spain is Barcelona". Here, the belief Lisa is expressing is only a probability from her point of view, because only she does not know that the capital of Spain is Madrid (from our point of view, the probability is 100%). However, we can still view this as a subjective probability because it expresses a measure of uncertainty. It is as though Lisa is saying "in 95% of cases where I feel as sure as I do about this, I turn out to be right".
  • Susan says "There is a lower chance of being shot in Omaha than in Detroit". Susan is expressing a belief based (presumably) on statistics.
  • Dr. Smith says to Christina, "There is a 75% chance that you will live." Dr. Smith is basing this off of his research.
  • Nicolas says "It will probably rain tomorrow." In this case the likelihood that it will rain is expressed in vague terms and is subjective, but implies that the speaker believes it is greater than 1/2 (or 50%). Subjective probabilities have been extensively studied, especially with regards to gambling and securities markets. While this type of probability is important, it is not the subject of this book. A good reference is "Degrees of Belief" By Steven Vick (2002).

Notice that in the previous examples the likelihood of any particular outcome is expressed as a percentage (between 0% and 100%), as is common in everyday language. However, probabilities in formal probability theory are always expressed as real numbers in the interval [0,1] (e.g. a probability of .25 may be expressed as 25%, or a probability of 1/\pi may be expressed as approximately 31.83%). Other differences exist between common expressions of probabilities and formal probability theory. For example, a probability of 0% is typically taken to mean that the event to which that probability is assigned is impossible. However, in probability theory (usually in cases where there are infinitely many possible outcomes) an event ascribed a probability of zero may actually occur. In some situations, it is certain that such an event will occur (e.g. in selecting a real number between 0 and 1, the probability of selecting any given number is zero, but it is certain that one such number will be selected).

Another way to express the probability of an outcome is by its odds: the ratio of the probability of "success" (event occurs) to the probability of "failure" (event does not occur). In gambling odds are expressed as the ratio of the stakes risked by each participant in a wager. For instance: a bookmaker offering odds of 3 to 1 "against" a horse will pay a punter three times their stake (if the horse wins). In fact, the bookmaker (ignoring factors such as his potential need to "lay off" bets which are exposing him to the possibility of an unacceptable overall loss) is announcing that he thinks the horse has a 1/4 chance of winning. If we express odds as "chance of winning" : " chance of not winning", then 3 to 1 against would be represented as 1:3 = 1/4:3/4 or 1/3. So an event with a probability of 1/4 or 25% has odds of 33%. This disparity is even more clear where an event has a probability of 50% (e.g., the odds of a coin showing heads is 50%:50% = 1:1 or 1).

Types of probability[edit]

There are basically four types of probabilities, each with its limitations. None of these approaches to probability is wrong, per se, but some are more useful or more general than others.

There are two standard approaches to conceptually interpreting probabilities. The first is known as the long run (or the relative frequency approach) and the subjective belief (or confidence approach). In the Frequency Theory of Probability, probability is the limit of the relative frequency with which an event occurs in repeated trials (note that trials must be independent).

Frequentists talk about probabilities only when dealing with experiments that are random and well-defined. The probability of a random event denotes the relative frequency of occurrence of an experiment's outcome, when repeating the experiment. Frequentists consider probability to be the relative frequency "in the long run" of outcomes.

Physical probabilities, which are also called objective or frequency probabilities, are associated with random physical systems such as roulette wheels, rolling dice and radioactive atoms. In such systems, a given type of event (such as the dice yielding a six) tends to occur at a persistent rate, or 'relative frequency', in a long run of trials. Physical probabilities either explain, or are invoked to explain, these stable frequencies. Thus talk about physical probability makes sense only when dealing with well defined random experiments. The two main kinds of theory of physical probability are frequentist accounts (such as Venn) and propensity accounts.

Relative frequencies are always between 0% (the event essentially never happens) and 100% (the event essentially always happens), so in this theory as well, probabilities are between 0% and 100%. According to the Frequency Theory of Probability, what it means to say that "the probability that A occurs is p%" is that if you repeat the experiment over and over again, independently and under essentially identical conditions, the percentage of the time that A occurs will converge to p. For example, under the Frequency Theory, to say that the chance that a coin lands heads is 50% means that if you toss the coin over and over again, independently, the ratio of the number of times the coin lands heads to the total number of tosses approaches a limiting value of 50% as the number of tosses grows. Because the ratio of heads to tosses is always between 0% and 100%, when the probability exists it must be between 0% and 100%.

In the Subjective Theory of Probability, probability measures the speaker's "degree of belief" that the event will occur, on a scale of 0% (complete disbelief that the event will happen) to 100% (certainty that the event will happen). According to the Subjective Theory, what it means for me to say that "the probability that A occurs is 2/3" is that I believe that A will happen twice as strongly as I believe that A will not happen. The Subjective Theory is particularly useful in assigning meaning to the probability of events that in principle can occur only once. For example, how might one assign meaning to a statement like "there is a 25% chance of an earthquake on the San Andreas fault with magnitude 8 or larger before 2050?" (See Freedman and Stark, 2003, for more discussion of theories of probability and their application to earthquakes.) It is very hard to use either the Theory of Equally Likely Outcomes or the Frequency Theory to make sense of the assertion.

Bayesians, however, assign probabilities to any statement whatsoever, even when no random process is involved. Probability, for a Bayesian, is a way to represent an individual's degree of belief in a statement, given the evidence.

Evidential probability, also called Bayesian probability, can be assigned to any statement whatsoever, even when no random process is involved, as a way to represent its subjective plausibility, or the degree to which the statement is supported by the available evidence. On most accounts, evidential probabilities are considered to be degrees of belief, defined in terms of dispositions to gamble at certain odds. The four main evidential interpretations are the classical interpretation, the subjective interpretation, the epistemic or inductive interpretation, and the logical interpretation.

Classical theory of probability[edit]

The classical approach to probability is to count the number of favorable outcomes, the number of total outcomes (outcomes are assumed to be mutually exclusive and equiprobable), and express the probability as a ratio of these two numbers. Here, "favorable" refers not to any subjective value given to the outcomes, but is rather the classical terminology used to indicate that an outcome belongs to a given event of interest. What is meant by this will be made clear by an example, and formalized with the introduction of axiomatic probability theory.

Classical definition of probability
If the number of outcomes belonging to an event E is N_{E}, and the total number of outcomes is N, then the probability of event E is defined as p_{E} = \frac{N_{E}}{N}.

For example, a standard deck of cards (without jokers) has 52 cards. If we randomly draw a card from the deck, we can think of each card as a possible outcome. Therefore, there are 52 total outcomes. We can now look at various events and calculate their probabilities:

  • Out of the 52 cards, there are 13 clubs. Therefore, if the event of interest is drawing a club, there are 13 favorable outcomes, and the probability of this event is \frac{13}{52} = \frac{1}{4}.
  • There are 4 kings (one of each suit). The probability of drawing a king is \frac{4}{52} = \frac{1}{13}.
  • What is the probability of drawing a king OR a club? This example is slightly more complicated. We cannot simply add together the number of outcomes for each event separately (4 + 13 = 17) as this inadvertently counts one of the outcomes twice (the king of clubs). The correct answer is \frac{16}{52} from \frac{13}{52}+\frac{4}{52}-\frac{1}{52} where this is essentially p(\textrm{club})+p(\textrm{king})-p(\textrm{king\ of\ clubs}).

Classical probability suffers from a serious limitation. The definition of probability implicitly defines all outcomes to be equiprobable. While this might be useful for drawing cards, rolling dice, or pulling balls from urns, it offers no method for dealing with outcomes with unequal probabilities.

This limitation can even lead to mistaken statements about probabilities. An often given example goes like this:

I could be hit by a meteor tomorrow. There are two possible outcomes: I will be hit, or I will not be hit. Therefore, the probability I will be hit by a meteor tomorrow is \frac{1}{2} = 50%.

Of course, the problem here is not with the classical theory, merely the attempted application of the theory to a situation to which it is not well adapted.

This limitation does not, however, mean that the classical theory of probability is useless. At many points in the development of the axiomatic approach to probability, classical theory is an important guiding factor.

Empirical or Statistical Probability or Frequency of occurrence[edit]

This approach to probability is well-suited to a wide range of scientific disciplines. It is based on the idea that the underlying probability of an event can be measured by repeated trials.

Empirical or Statistical Probability as a measure of frequency
Let n_{A} be the number of times event A occurs after n trials. We define the probability of event A as

p_{A} = \lim_{n\to \infty}\frac{n_{A}}{n}

It is of course impossible to conduct an infinite number of trials. However, it usually suffices to conduct a large number of trials, where the standard of large depends on the probability being measured and how accurate a measurement we need.

A note on this definition of probability: How do we know the sequence \frac{n_{A}}{n} in the limit will converge to the same result every time, or that it will converge at all? The unfortunate answer is that we don't. To see this, consider an experiment consisting of flipping a coin an infinite number of times. We are interested in the probability of heads coming up. Imagine the result is the following sequence:

HTHHTTHHHHTTTTHHHHHHHHTTTTTTTTHHHHHHHHHHHHHHHHTTTTTTTTTTTTTTTT...

with each run of k heads and k tails being followed by another run twice as long. For this example, the sequence \frac{n_{A}}{n} oscillates between roughly \frac{1}{3} and \frac{2}{3} and doesn't converge.

We might expect such sequences to be unlikely, and we would be right. It will be shown later that the probability of such a run is 0, as is a sequence that converges to anything other than the underlying probability of the event. However, such examples make it clear that the limit in the definition above does not express convergence in the more familiar sense, but rather some kind of convergence in probability. The problem of formulating exactly what this means belongs to axiomatic probability theory.

Axiomatic probability theory[edit]

Axiomatic probability theory, although it is often frightening to beginners, is the most general approach to probability, and has been employed in tackling some of the more difficult problems in probability. We start with a set of axioms, which serve to define a probability space. Although these axioms may not be immediately intuitive, be assured that the development is guided by the more familiar classical probability theory.

Let S be the sample space of a random experiment. The probability P is a real valued function whose domain is the power set of S and range is the interval [0,1] satisfying the following axioms:

(i) For any event E, P (E) ≥ 0

(ii) P (S) = 1

(iii) If E and F are mutually exclusive events, then P(E ∪ F) = P(E) + P(F).

It follows from (iii) that P(φ) = 0. To prove this, we take F = φ and note that E and φ are disjoint events. Therefore, from axiom (iii), we get P (E ∪ φ) = P (E) + P (φ) or P(E) = P(E) + P (φ) i.e. P (φ) = 0. Let S be a sample space containing outcomes ω1 , ω2 ,...,ωn , i.e., S = {ω1, ω2, ..., ωn}

It follows from the axiomatic definition of probability that:

(i) 0 ≤ P (ωi) ≤ 1 for each ωi ∈ S

(ii) P (ω1) + P (ω2) + ... + P (ωn) = 1

(iii) For any event A, P(A) = Σ P(ωi ), ωi ∈ A.

About This Book[edit]

This book is going to discuss the topic of mathematical probability using Calculus and Abstract Algebra. Readers of this book should have a good understanding of both those topics before attempting to read and understand this book completely.