# Probability/Introduction

Probability theory mathematically formulates incomplete knowledge pertaining to the likelihood of an occurrence. For example, a meteorologist might say there is a 60% chance that it will rain tomorrow. This means that in 6 of every 10 times when the world is in the current state, it will rain.

A probability is a real number $p \in [0,1]$. In everyday speech, the number is usually expressed as a percentage (between 0% and 100%) rather than a decimal (i.e., a probability of 0.25 is expressed as 25%). A probability of 100% means that an event is certain. In everyday speech, probability of 0% is taken to mean that the event is impossible, but (usually in where there are an infinity of possible outcomes) an event originally ascribed a probability of 0% MAY be the one that occurs. In some situations, it is CERTAIN that the event which occurs will be one that originally was ascribed zero probability (for example, in selecting a number uniformly between 0 and 1, the probability of selecting any given number is zero, but it is certain that one such number will be selected).

Another way of referring to probability of an outcome is by its odds: the ratio of the probability of "success" (event occurs) to the probability of "failure" (event does not occur). In the gambling world (where "odds" evolved) odds are expressed as the ratio of the STAKES risked by each participant in a wager. For instance: a bookmaker offering odds of 3/1 "against" a horse will pay a punter three times their stake (if the horse wins). In fact, the bookmaker (ignoring factors such as his potential need to "lay off" bets which are exposing him to the possibility of an unacceptable overall loss) is announcing that he thinks the horse has a 1/4 chance of winning. Using the mathematical definition of odds, "chance of winning" : " chance of not winning" = 1/4 : 3/4 = 1:3 or 1/3. So an event with a probability of 25% has odds of 33%. This disparity is even more clear where an event has a probability of 50% (e.g., the ODDS of a coin showing heads is 50%:50% = 1:1 or 1).

## Types of probability

There are basically four types of probabilities, each with its limitations. None of these approaches to probability is wrong, per se, but some are more useful or more general than others.

In everyday speech, we express our beliefs about likelihoods of events using the same terminology as in probability theory. Often, this has nothing to do with any formal definition of probability, rather it is an intuitive idea guided by our experience, and in some cases statistics.

Consider the following examples:

• Bill says "Don't buy the avocados here; about half the time, they're rotten". Bill is expressing his belief about the probability of an event — that an avocado will be rotten — based on his personal experience.
• Lisa says "I am 95% certain the capital of Spain is Barcelona". Here, the belief Lisa is expressing is only a probability from her point of view, because only she does not know that the capital of Spain is Madrid (from our point of view, the probability is 100%). However, we can still view this as a subjective probability because it expresses a measure of uncertainty. It is as though Lisa is saying "in 95% of cases where I feel as sure as I do about this, I turn out to be right".
• Susan says "There is a lower chance of being shot in Omaha than in Detroit". Susan is expressing a belief based (presumably) on statistics.
• Dr. Smith says to Christina, "There is a 75% chance that you will live." Dr. Smith is basing this off of his research.

Probability can also be expressed in vague terms. For example, someone might say it will probably rain tomorrow. This is subjective, but implies that the speaker believes the probability is greater than 50%.

Subjective probabilities have been extensively studied, especially with regards to gambling and securities markets. While this type of probability is important, it is not the subject of this book. A good reference is "Degrees of Belief" By Steven Vick (2002).

There are two standard approaches to conceptually interpreting probabilities. The first is known as the long run (or the relative frequency approach) and the subjective belief (or confidence approach). In the Frequency Theory of Probability, probability is the limit of the relative frequency with which an event occurs in repeated trials (note that trials must be independent).

Frequentists talk about probabilities only when dealing with experiments that are random and well-defined. The probability of a random event denotes the relative frequency of occurrence of an experiment's outcome, when repeating the experiment. Frequentists consider probability to be the relative frequency "in the long run" of outcomes.

Physical probabilities, which are also called objective or frequency probabilities, are associated with random physical systems such as roulette wheels, rolling dice and radioactive atoms. In such systems, a given type of event (such as the dice yielding a six) tends to occur at a persistent rate, or 'relative frequency', in a long run of trials. Physical probabilities either explain, or are invoked to explain, these stable frequencies. Thus talk about physical probability makes sense only when dealing with well defined random experiments. The two main kinds of theory of physical probability are frequentist accounts (such as Venn) and propensity accounts.

Relative frequencies are always between 0% (the event essentially never happens) and 100% (the event essentially always happens), so in this theory as well, probabilities are between 0% and 100%. According to the Frequency Theory of Probability, what it means to say that "the probability that A occurs is p%" is that if you repeat the experiment over and over again, independently and under essentially identical conditions, the percentage of the time that A occurs will converge to p. For example, under the Frequency Theory, to say that the chance that a coin lands heads is 50% means that if you toss the coin over and over again, independently, the ratio of the number of times the coin lands heads to the total number of tosses approaches a limiting value of 50% as the number of tosses grows. Because the ratio of heads to tosses is always between 0% and 100%, when the probability exists it must be between 0% and 100%.

In the Subjective Theory of Probability, probability measures the speaker's "degree of belief" that the event will occur, on a scale of 0% (complete disbelief that the event will happen) to 100% (certainty that the event will happen). According to the Subjective Theory, what it means for me to say that "the probability that A occurs is 2/3" is that I believe that A will happen twice as strongly as I believe that A will not happen. The Subjective Theory is particularly useful in assigning meaning to the probability of events that in principle can occur only once. For example, how might one assign meaning to a statement like "there is a 25% chance of an earthquake on the San Andreas fault with magnitude 8 or larger before 2050?" (See Freedman and Stark, 2003, for more discussion of theories of probability and their application to earthquakes.) It is very hard to use either the Theory of Equally Likely Outcomes or the Frequency Theory to make sense of the assertion.

Bayesians, however, assign probabilities to any statement whatsoever, even when no random process is involved. Probability, for a Bayesian, is a way to represent an individual's degree of belief in a statement, given the evidence.

Evidential probability, also called Bayesian probability, can be assigned to any statement whatsoever, even when no random process is involved, as a way to represent its subjective plausibility, or the degree to which the statement is supported by the available evidence. On most accounts, evidential probabilities are considered to be degrees of belief, defined in terms of dispositions to gamble at certain odds. The four main evidential interpretations are the classical interpretation, the subjective interpretation, the epistemic or inductive interpretation, and the logical interpretation.

### Classical theory of probability

The classical approach to probability is to count the number of favorable outcomes, the number of total outcomes (outcomes are assumed to be mutually exclusive and equiprobable), and express the probability as a ratio of these two numbers. Here, "favorable" refers not to any subjective value given to the outcomes, but is rather the classical terminology used to indicate that an outcome belongs to a given event of interest. What is meant by this will be made clear by an example, and formalized with the introduction of axiomatic probability theory.

 Classical definition of probability If the number of outcomes belonging to an event $E$ is $N_{E}$, and the total number of outcomes is $N$, then the probability of event $E$ is defined as $p_{E} = \frac{N_{E}}{N}$.

For example, a standard deck of cards (without jokers) has 52 cards. If we randomly draw a card from the deck, we can think of each card as a possible outcome. Therefore, there are 52 total outcomes. We can now look at various events and calculate their probabilities:

• Out of the 52 cards, there are 13 clubs. Therefore, if the event of interest is drawing a club, there are 13 favorable outcomes, and the probability of this event is $\frac{13}{52} = \frac{1}{4}$.
• There are 4 kings (one of each suit). The probability of drawing a king is $\frac{4}{52} = \frac{1}{13}$.
• What is the probability of drawing a king OR a club? This example is slightly more complicated. We cannot simply add together the number of outcomes for each event separately ($4 + 13 = 17$) as this inadvertently counts one of the outcomes twice (the king of clubs). The correct answer is $\frac{16}{52}$ from $\frac{13}{52}+\frac{4}{52}-\frac{1}{52}$ where this is essentially $p(\textrm{club})+p(\textrm{king})-p(\textrm{king\ of\ clubs})$.

Classical probability suffers from a serious limitation. The definition of probability implicitly defines all outcomes to be equiprobable. While this might be useful for drawing cards, rolling dice, or pulling balls from urns, it offers no method for dealing with outcomes with unequal probabilities.

This limitation can even lead to mistaken statements about probabilities. An often given example goes like this:

I could be hit by a meteor tomorrow. There are two possible outcomes: I will be hit, or I will not be hit. Therefore, the probability I will be hit by a meteor tomorrow is $\frac{1}{2} = 50%$.

Of course, the problem here is not with the classical theory, merely the attempted application of the theory to a situation to which it is not well adapted.

This limitation does not, however, mean that the classical theory of probability is useless. At many points in the development of the axiomatic approach to probability, classical theory is an important guiding factor.

### Emperial or Statistical Probabilty or Frequency of occurrence

This approach to probability is well-suited to a wide range of scientific disciplines. It is based on the idea that the underlying probability of an event can be measured by repeated trials.

 Emperial or Statistical Probability as a measure of frequency Let $n_{A}$ be the number of times event $A$ occurs after $n$ trials. We define the probability of event $A$ as $p_{A} = \lim_{n\to \infty}\frac{n_{A}}{n}$

It is of course impossible to conduct an infinite number of trials. However, it usually suffices to conduct a large number of trials, where the standard of large depends on the probability being measured and how accurate a measurement we need.

A note on this definition of probability: How do we know the sequence $\frac{n_{A}}{n}$ in the limit will converge to the same result every time, or that it will converge at all? The unfortunate answer is that we don't. To see this, consider an experiment consisting of flipping a coin an infinite number of times. We are interested in the probability of heads coming up. Imagine the result is the following sequence:

HTHHTTHHHHTTTTHHHHHHHHTTTTTTTTHHHHHHHHHHHHHHHHTTTTTTTTTTTTTTTT...

with each run of $k$ heads and $k$ tails being followed by another run twice as long. For this example, the sequence $\frac{n_{A}}{n}$ oscillates between roughly $\frac{1}{3}$ and $\frac{2}{3}$ and doesn't converge.

We might expect such sequences to be unlikely, and we would be right. It will be shown later that the probability of such a run is 0, as is a sequence that converges to anything other than the underlying probability of the event. However, such examples make it clear that the limit in the definition above does not express convergence in the more familiar sense, but rather some kind of convergence in probability. The problem of formulating exactly what this means belongs to axiomatic probability theory.

### Axiomatic probability theory

Axiomatic probability theory, although it is often frightening to beginners, is the most general approach to probability, and has been employed in tackling some of the more difficult problems in probability. We start with a set of axioms, which serve to define a probability space. Although these axioms may not be immediately intuitive, be assured that the development is guided by the more familiar classical probability theory.

Let S be the sample space of a random experiment. The probability P is a real valued function whose domain is the power set of S and range is the interval [0,1] satisfying the following axioms:

(i) For any event E, P (E) ≥ 0

(ii) P (S) = 1

(iii) If E and F are mutually exclusive events, then P(E ∪ F) = P(E) + P(F).

It follows from (iii) that P(φ) = 0. To prove this, we take F = φ and note that E and φ are disjoint events. Therefore, from axiom (iii), we get P (E ∪ φ) = P (E) + P (φ) or P(E) = P(E) + P (φ) i.e. P (φ) = 0. Let S be a sample space containing outcomes ω1 , ω2 ,...,ωn , i.e., S = {ω1, ω2, ..., ωn}

It follows from the axiomatic definition of probability that:

(i) 0 ≤ P (ωi) ≤ 1 for each ωi ∈ S

(ii) P (ω1) + P (ω2) + ... + P (ωn) = 1

(iii) For any event A, P(A) = Σ P(ωi ), ωi ∈ A.