Probability/Probability Spaces

A reviewed version of this page, approved on 5 December 2019, was based off this revision.

Concept

We will now proceed to develop a more axiomatic theory of probability, allowing for a simpler mathematical formalism. We shall proceed by developing the concept of a probability space, which will allow us to harness many theorems in mathematical analysis.

Recall that an experiment is any action or process with an outcome that is subject to uncertainty or randomness. A probability space or a probability triple is a mathematical construct that models an experiment and its set of possible outcomes.

Probability Space

A probability space is a mathematical triplet $(\Omega ,{\mathcal {F}},P)$ consisting of the sample space $\Omega$ , a set of events ${\mathcal {F}}$ , and a probability function $P$ that presents a model for a particular class of real-world situations. A probability space is arbitrary, in that its author ultimately defines which elements $\Omega$ , ${\mathcal {F}}$ , and $P$ will contain.

Sample Space

The sample space, $\Omega$ , is the non-empty set whose elements are all possible outcomes of an experiment. Without an assignment of a sample space and without knowing its size, no conclusions could be made about the probabilities of outcomes, or collections of outcomes. It is common to refer to a sample space by the labels $S$ , $\Omega$ , or $U$ (for "universal set"). An Outcome $\omega$ may be a state of nature, a possibility, an experimental result and the like. Any instance or execution of a real-world situation modeled by a probability space must produce exactly one outcome. If outcomes of different trials of an experiment differ in any way that matters, they are considered distinct outcomes. Which differences matter depends on the kind of analysis we wish to perform: This leads to different choices of a sample space. A common example consists of a random experiment involving a single coin toss. Here, it seems appropriate to define the sample space as the set $\Omega =\{{\text{H}},{\text{T}}\}$ .

Events

Since individual outcomes might be of little practical use, more complex events are used to characterize groups of outcomes. An event is any subset of zero or more outcomes contained in a given sample space. A simple event consists of exactly one outcome and a compound event consists of more than one outcome. For example, when tossing a single coin $\Omega =\{{\text{H}},{\text{T}}\}$ , possible events are $\{\}$ , $\{H\}$ , $\{T\}$ , and $\{H,T\}$ . The collection of all such events is a σ-algebra ${\mathcal {F}}$ . Intuitively, the probability of each of these sets is the chance that one of the events in the set will happen; $P(\{{\text{H}}\})$ is the chance of tossing a head, $P(\{{\text{H}},{\text{T}}\})$ is the chance of the coin landing either heads or tails, and $P(\{\})$ is the probability of the coin landing neither heads nor tails, etc. An event is said to have happened or occurred during an experiment when the outcome of the experiment is an element of the event. Since the same outcome may be a member of many events, it is possible for many events to have happened given a single outcome. For example, when the trial consists of throwing two dice, the set of all outcomes with a sum of 7 pips may constitute an event, whereas outcomes with an odd number of pips may constitute another event. If the outcome is the element of the elementary event of two pips on the first die and five on the second, then both of the events, "7 pips" and "odd number of pips", are said to have occurred. Modeling events as sets of outcomes in a sample space $\Omega$ allows us to leverage all of the regular set operations:

Given two events $A$ and $B$ :

The null subset $\emptyset$ in a sample space $\Omega$ is called an impossible event.
The union of two events $A\cup B$ consists of all outcomes that are in $A$ or in $B$ or in both,
The intersection $A\cap B$ consists of all outcomes that are both in $A$ and $B$ .
The complement $A^{c}$ of an event $A$ in a sample space $\Omega$ consists of all outcomes not in $A$ , but in $\Omega$ , i.e. $A\cup A^{c}=\Omega$ .

Probability Measure Function

Finally, there is a need to specify each event's likelihood of happening. This is done using the probability measure function, $P$ which maps events to probabilities. Recall that probability is expressed as a real number between zero (typically an impossible event, though zero-probability events are not necessarily impossible) and one (an event happens almost surely, with total certainty). Thus $P$ is a function $P:{\mathcal {F}}\rightarrow [0,1]$ .

Once the probability space is established, it is assumed that “nature” makes its move and selects a single outcome (also called a sample point), $\omega$ , from the sample space $\Omega$ . All the events in ${\mathcal {F}}$ that contain the selected outcome $\omega$ (recall that each event is a subset of $\Omega$ ) will have "happened" or "occurred". The selection performed by nature is done in such a way that if the experiment were to be repeated an infinite number of times, the relative frequencies of occurrence of each of the events would coincide with the probabilities prescribed by the function $P$ .

Probability Definition

Having properly defined a probability space, we may now provide the following definitions for probability:

Informal Definition of Probability

The probability of an event is a measure of the chance with which we can expect the event to occur. We assign a number between 0 and 1 inclusive to the probability of an event. A probability of 1 means that we are certain the event will occur, and a probability of 0 means that we certain the event will not occur. The probability of any event A in a sample space $\Omega$ is denoted by $P(A)$ .

Classical Definition of Probability

If there are n equally likely outcomes for an experiment, of which one must occur, and m of these belong to an event of interest $A$ , then the probability of the event or a "success" is given by $P(A)$ = m/n.

Computing Probability for Classical Approach

When all outcomes are equally likely
1. Count the number of outcomes n in the sample space.
2. Count the number of outcomes m in the event of interest, $A$ .
3. $P(A)$ = m/n.
When all outcomes are not equally likely
1. Let $\omega _{1},\omega _{2},...,\omega _{n}$ be the outcomes of a sample space $\Omega$ . Let $P(\omega _{i})=p_{i},i=1,2,...,n$ . In this case, the probability of each outcome, $p_{i}$ , is assumed to be known.
2. List all the outcomes in $A$ , say, $\omega _{i},\omega _{j},...,\omega _{m}$ .
3. $P(A)=P(\omega _{i})+P(\omega _{j})+\dotsb +P(\omega _{m})=p_{i}+p_{j}+\dotsb +p_{m}$ , the sum of the probabilities of the outcomes in $A$ .

The classical concept of probability works well when all the outcomes of an experiment can be regarded as equally likely, like when we roll a die or toss a coin, but it falls short in scenarios where outcomes are not equally likely and their probabilities are not known beforehand, like when we are interested in the probability of a sports team winning a championship game. For these situations, the frequency interpretation of probability (developed as a result of the work by R. Von Mises in 1936) is fitting.

Frequency Definition of Probability

The probability of an event or outcome is the proportion of times the event or outcome would occur in a long run of trials of repeated experiments.

For example, in tossing a coin, if $n(H)$ is the number of times heads appears in $n$ trials, then the probability of getting heads is $P(H)=lim_{n\rightarrow \infty }(n(H)/n)$ . This is just a natural extension of the classical approach of probability, but which works for both a non-biased (equally likely outcomes) and a biased (not equally likely outcomes) coin. This approach is quite useful, but requires that an experiment be able to be repeated many times under identical circumstances, which is not always possible. For a more complete picture, we define probabilities axiomatically (from the 1933 studies of A. N. Kolmogorov).

Axiomatic Definition of Probability

Let $\Omega$ be a sample space of an experiment. Probability P() is a real-valued function that assigns to each event $E$ in $\Omega$ a number $P(E)$ , called the probability of $E$ , with the following conditions satisfied:

For every event $E\in {\mathcal {F}},P(E)\geq 0$
It is unity for any event. That is, $P(\Omega )=1$
It is additive over the union of an infinite number of pairwise disjoint events, that is, if $E_{i}\cap E_{j}=\emptyset$ , for $i\neq j$ in $\Omega$ , then $P\left(\cup _{i=1}^{\infty }E_{i}\right)=\Sigma _{i=1}^{\infty }P(E_{i})$

Other Definitions and Properties

Basic Properties of Probability

$P(\emptyset )=0$ .

Proof

First consider the infinite collection of events $A_{1}=\emptyset ,A_{2}=\emptyset ,...$ . Since $\emptyset \cap \emptyset =\emptyset$ , the events in this collection are disjoint and $\bigcup A_{i}=\emptyset$ . The third axiom then gives

P(\emptyset )=\sum P(\emptyset )

This can happen only if $P(\emptyset )=0$ .

The property of probability in the third axiom of probability is valid for a finite collection of events.

Proof

Suppose that $A_{1},A_{2},...,A_{k}$ are disjoint events, and append to these the infinite collection $A_{k+1}=\emptyset ,A_{k+2}=\emptyset ,...$ . By invoking the third axiom,

P\left(\bigcup _{i=1}^{k}A_{i}\right)=P\left(\bigcup _{i=1}^{\infty }A_{i}\right)=\sum _{i=1}^{\infty }P(A_{i})=\sum _{i=1}^{k}P(A_{i})

as desired.

For any event $A,P(A)=1-P(A^{c})$

Proof
Since by definition of $A^{c},A\cup A^{c}=\Omega$ and since $A$ and $A^{c}$ are clearly disjoint, then $P(A)=P(A)+P(A^{c})-P(A^{c})=P(A\cup A^{c})-P(A^{c})=P(\Omega )-P(A^{c})=1-P(A^{c})$ .

For any event $A,P(A)\leq 1$

Proof
Since by axiom 1 of probability $P(A^{c})\geq 0$ , then $P(A)\leq P(A)+P(A^{c})=1$ .

If $A\subseteq B$ , then $P(A)\leq P(B)$

Proof

Let $A$ and $B$ be arbitrary events in a sample space $\Omega$ .

Without loss of generality, first note that $B=B\cap \Omega =B\cap (A\cup A^{c})=(B\cap A)\cup (B\cap A^{c})$ . But since $A\subseteq B$ , then $(B\cap A)=A$ , so that $B=A\cup (B\cap A^{c})$ .

Since $A$ and $B\cap A^{c}$ are clearly disjoint, and since $P(B\cap A^{c})\geq 0$ by Axiom 2, then by Axioms 2 and 3 we have that $P(A)\leq P(A)+P(B\cap A^{c})=P(B)$ , as desired.

For any events $A$ and $B$ , $P(A\cup B)=P(A)+P(B)-P(A\cap B)$ .

Proof

Before proceeding to the actual proof, realize that this proposition should be intuitive. If the events were disjoint, then $P(A\cap B)=0$ , and by axiom 3 $P(A\cup B)=P(A)+P(B)$ . But since these are any two events, they may contain an intersection which, if we sum $P(A)$ and $P(B)$ , would be counted twice. This is illustrated in the figures below.

Proof

Let $A$ and $B$ be arbitrary events in a sample space $\Omega$ .

Without loss of generality, first note that $A\cup B=A\cup (B\cap A^{c})$ , and because $A$ and $B\cap A^{c}$ are clearly disjoint, then by Axiom 3 we have that $P(A\cup B)=P(A)+P(B\cap A^{c})$ .

Likewise, also note that $B=B\cap \Omega =B\cap (A\cup A^{c})=(B\cap A)\cup (B\cap A^{c})$ , and since $(B\cap A)$ and $(B\cap A^{c})$ are clearly disjoint, then by Axiom 3 we have that $P(B)=P(B\cap A)+P(B\cap A^{c})$ , or $P(B\cap A^{c})=P(B)-P(B\cap A)$ .

Combining these two results we get $P(A\cup B)=P(A)+P(B)-P(B\cap A)$ , as desired.

Mutually Exclusive Events

Two events $A$ and $B$ are said to be mutually exclusive or disjoint if $A\cap B=\emptyset$ . This means that mutually exclusive events cannot happen together. A collection of n events are mutually exclusive when the occurrence of any one event implies that the remaining n-1 events will not occur. Any two mutually exclusive events $A$ and $B$ have the following probability properties:

$P(A\cap B)=0$
$P(A\cap B)=P(A)+P(B)$ . Note that this is just a special case of the general property where $P(A\cup B)=P(A)+P(B)-P(A\cap B)$ .

(Collectively) Exhaustive

A set of events is jointly or collectively exhaustive if at least one of the events must occur. For example, when rolling a six-sided die, the outcomes 1, 2, 3, 4, 5, and 6 are collectively exhaustive, because they encompass the entire range of possible outcomes.

Another way to describe collectively exhaustive events is that their union must cover the entire sample space, or $A\cup B=\Omega$ , where $\Omega$ is the sample space.

Test of Independence

If for two events $A$ and $B$ , $p(A\cap B)$ is not equal to $p(A)p(B)$ , then $A$ and $B$ are said to be associated or dependent. If $p(A\cap B)>p(A)p(B)$ , so that $p(A|B)>p(A)$ and $p(B|A)>p(B)$ , then the events are said to be positively associated. If, however, $p(A\cap B)<p(A)p(B)$ , so that $p(A|B)<p(A)$ and $p(B|A)<p(B)$ , one says that the events are negatively associated.

Simple Random Sampling

In a simple random sample, one person must take a random sample from a population, and not have any order in which one chooses the specific individual. In statistics, a simple random sample is a subset of individuals (a sample) chosen from a larger set (a population). Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals. Simple random sampling can be done with or without replacement, though it is typically done without, i.e., one deliberately avoids choosing any member of the population more than once. When the sample is drawn with replacement, the same individual can be chosen more than once. When the sample is drawn without replacement, the same individual can be chosen no more than once in a given sample. Therefore, random sampling of one individual at a time means that every possible individual in the large group has an equal probability of being drawn.

Independence of random variables

If X is a real-valued random variable and a is a number then the event {X ≤ a} is the set of outcomes that correspond to X being less than or equal to a. Since these are sets of outcomes that have probabilities, it makes sense to refer to events of this sort being independent of other events of this sort.

Why are the p and q Bernoulli trial probabilities multiplied together in the binomial formula? The probability of an event can be expressed as a binomial probability if its outcomes can be broken down into two probabilities p and q, where p and q are complementary (i.e. p + q = 1). Binomial probability typically deals with the probability of several successive decisions, each of which has two possible outcomes. The binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n = 1, the binomial distribution is a Bernoulli distribution. Bernoulli process is a discrete-time stochastic process consisting of a sequence of independent random variables taking values over two symbols. Prosaically, a Bernoulli process is coin flipping several times, possibly with an unfair coin. A variable in such a sequence may be called a Bernoulli variable. In other words, a Bernoulli process is a sequence of independent identically distributed Bernoulli trials. Independence of Bernoulli trials implies memorylessness property: past trials do not provide any information regarding future outcomes. From any given time, future trials is also a Bernoulli process independent of the past (fresh-start property). a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent. Two events A and B are independent if and only if Pr(A ∩ B) = Pr(A)Pr(B). Here A ∩ B is the intersection of A and B, that is, it is the event that both events A and B occur. This is called the multiplication rule for independent events.

Further Concepts in Set Theory

Union and intersection: We can indicate the union and intersection of two sets with the symbols ∪ ∩, respectively. If two sets, then ∪ the set “”―the set that consists of all the elements that are either in set in set . Similarly, ∩ read “”, which is the set of all elements that are in both .

Set difference. We can “subtract” one set from another using the symbol “－”. If sets, then － the set consisting of all the elements that are members of are not members of . This is a little different from the subtraction you may be used to in arithmetic.

Whole numbers: 1. Non-negative integers (aka real numbers) – {0, 1, 2, 3...} 2. Positive integers – {1, 2, 3, 4....} 3. All possible integers – {-3, -2, -1, 0, 1, 2, 3...}

AND --> 2 or more things/events happen. Therefore, we MULTIPLY (product) probabilities together -- INTERSECTION OR --> 1 of the 2 events/things occur. Therefore we ADD (sum) probabilities together -- UNION

Events that are NOT mutually exclusive means we could draw them both (for example, we want to draw a 2 OR an 'M'...these events are NOT mutually exclusive, because they cannot occur together at the same time).

Events that are NOT mutually exclusive, but rather are independent, we must subtract: 1-p(A∩B) from p(A) + p(B). So that, p(A) + p(B) -1 p(A∩B).

Probabilities for independent events often involve exponents, and probabilities for dependent events (conditional probability) often involve factorials.

Permutations and Combinations

Permutations: arrangement of objects without repetition where order is important. Permutations using all the objects: n objects, arranged into group size of n without repetition, and order being important – P(n, n) = N!; Example: Find all permutations of A, B, C.

Permutations of some of the objects: n objects, group size r, order is important. P(n, r) = N!/(n-r)! Example: Find all 2-letter combinations using letters A, B, C.

Distinguished permutations: if a word has n letters, k of which are unique, let n (n1, n2, n3....nk) be the frequency of each of the k letters. N!/(n1!)(n2!)(n3!)

Combinations: arrangement of objects without repetition, where order is NOT important. A combination of n objects, arranged in groups of size r, without repetition, and order being important. C(n, r) = N!/r!(n-r)!

Another way to write a combination of n things, r at a time, is using the Binomial notation (Binomial Distribution), sometimes described as "n choose r".

Counting Rules

Rule 1: If any one of K mutually exclusive and exhaustive events can occur on each of N trials, there are KN different sequences that may result from a set of such trials; Example: Flip a coin three times, finding the number of possible sequences. N=3, K=2, therefore, KN =(2)(3)=6

Rule 2: If K1, K2, ....KN are the numbers of distinct events that can occur on trials 1,....N in a series, the number of different sequences of N events that can occur is (K1)(K2)...(KN); Example: Flip a coin and roll a die, finding the number of possible sequences. Therefore, (K1)(K2) = (2)(6) = 12

Rule 3: The number of different ways that N distinct things may be arranged in order is N! = (1)(2)(3)....(N-1)(N), where 0! = 1. An arrangement in order is called a permutation, so that the total number of permutations of N objects is N! (the symbol N! Is called N-factorial); Example: Arrange 10 items in order, finding the number of possible ways. Therefore, 10! = 10x9x8x7x6x5x4x3x2x1 = 3628800

Rule 4: The number of ways of selecting and arranging r objects from among N distinct objects is: N!/(N-r)! [nPr]; Example: pick 3 things from 10 items, and arrange them in order. Therefore N=10, r=3, so 10!/(10-3)! = 10!/7! = 720

Rule 5: The total number of ways of selecting r distinct combinations of N objects, irrespective of order (ie order NOT important), is: N!/r!(N-r)! [nCr]; Example: Pick 3 items from 10 in any order, where N=10, r=3. Therefore, 10!/3!(7!) = 720/6 = 120

Consequences

We can now give some basic theorems using our axiomatic probability space.

Theorem 1

Given a probability space (Ω,S,P), for events $A,B\in S$ :

P(A\cup B)=P(A)+P(B)-P(A\cap B)

Axioms of Probability

A probability function has to satisfy the following three basic axioms or constraints: To each event a a measure (number) P(a) which is called the probability of event a is assigned. P(a) is subjected to the following three axioms:

P(a) ≥ 0
P(S) = 1
If a ∩ b = 0 , then P(a ∪ b) = P(a) + P(b)

Corollaries

P(0) = 0
P(a) = 1 － P(a') ≤ 1
If a ∩ b ≠ 0 , then P(a ∪ b) = P(a) + P(b) － P(a ∩ b)
If b ⊂ a , P(a) = P(b) + P(a ∩ b)≥ P(b)

Probability/Probability Spaces

Contents

Concept

Probability Space

Sample Space

Events

Probability Measure Function

Probability Definition

Informal Definition of Probability

Classical Definition of Probability

Computing Probability for Classical Approach

Frequency Definition of Probability

Axiomatic Definition of Probability

Other Definitions and Properties

Basic Properties of Probability

Mutually Exclusive Events

(Collectively) Exhaustive

Test of Independence

Simple Random Sampling

Independence of random variables

Further Concepts in Set Theory

Permutations and Combinations

Counting Rules

Consequences

Theorem 1

Axioms of Probability

Navigation menu

Probability/Probability Spaces

Concept

Probability Space

Sample Space

Events

Probability Measure Function

Probability Definition

Informal Definition of Probability

Classical Definition of Probability

Computing Probability for Classical Approach

Frequency Definition of Probability

Axiomatic Definition of Probability

Other Definitions and Properties

Basic Properties of Probability

Mutually Exclusive Events

(Collectively) Exhaustive

Test of Independence

Simple Random Sampling

Independence of random variables

Further Concepts in Set Theory

Permutations and Combinations

Counting Rules

Consequences

Theorem 1

Axioms of Probability

Navigation menu

Search