# Probability/Probability Spaces

## Terminologies

The name of this chapter, probability space, is a mathematical construct that models a random experiment. To be more precise:

Definition. (Probability space) A probability space is a mathematical triplet ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ consisting of three elements: a sample space ${\displaystyle \Omega }$, an event space ${\displaystyle {\mathcal {F}}}$, and a probability measure ${\displaystyle \mathbb {P} :{\mathcal {F}}\to [0,1]}$.

Let us first give the definitions related to sample space. For the definitions of event space and probability, we will discuss it in later sections.

Definition. (Sample space and sample point) The sample space of a random experiment, denoted by ${\displaystyle \Omega }$, is a nonempty set consisting of all possible and indecomposable outcomes of the random experiment, called sample points.

Remark.

• A sample point is often denoted by ${\displaystyle \omega }$ (lowercase omega), in contrast with the uppercase omega ${\displaystyle \Omega }$ used for the notation of sample space.
• By an indecomposable outcome, we mean we are unable to split the outcome into multiple more specific outcomes ("smaller pieces"). So, in other words, indecomposable outcomes mean outcomes that are expressed in a way that is as specific as possible. Furthermore, the random experiment should only result in exactly one of the indecomposable outcomes.
• For instance, when rolling a dice and consider the number facing up, we may say the outcomes are "odd number" and "even number". But such outcomes are decomposable: we can split "odd number" to three more specific outcomes: 1,3,5, and "even number" to three more specific outcomes: 2,4,6.
• On the other hand, we may regard the outcomes as the numbers 1,2,3,4,5,6, which are indecomposable, and are expressed in the most specific way.

Example. A sample space of tossing a coin is ${\displaystyle \Omega =\{H,T\}}$ (${\displaystyle H}$: heads come up, ${\displaystyle T}$: tails come up), and the sample points are ${\displaystyle H,T}$.

Definition. (Event) An event is a subset of the sample space.

Remark.

• So, an event is an aggregate of sample points.
• An event is said to be simple event if it consists of exactly one sample point.
• An event is said to be compound event if it consists of more than one sample point.

Definition. (Occurrence of events) An event occurs if it contains the outcome from the random experiment.

Example. We take the sample space of rolling a dice to be ${\displaystyle \Omega =\{1,2,3,4,5,6\}}$. Then, the sets ${\displaystyle E_{1}=\varnothing ,E_{2}=\{1,2,3\},E_{3}=\Omega }$ are events, while the set ${\displaystyle \{0\}}$ is not.

Suppose we get the number 3 from rolling the dice. Then, ${\displaystyle E_{2}}$ and ${\displaystyle E_{3}}$ occur, but ${\displaystyle E_{1}}$ does not occur.

## Probability interpretations

In this chapter, we will discuss probability mathematically, and we will give an axiomatic and abstract definition to probability (function). By axiomatic definition, we mean defining probability to be a function that satisfying some axioms, called probability axioms. But such axiomatic definition does not tell us how should we interpret the term "probability", so the definition is said to be independent from the interpretation of probability. Such independence make the formal definition always applicable, no matter how you interpret probability.

However, the axiomatic definition does not suggest a way to construct a probability measure (i.e., assigning probabilities to events): it just states that probability is a function satisfying certain axioms, but how can we construct such function in the first place? In this section, we will discuss two main types of probability interpretations: subjectivism and frequentism, where the method of assigning probabilities to events is mentioned in each of them.

### Subjectivism

Intuitively and naturally, probability of an event is often regarded as a numerical measure of the "chance" of the occurrence of the event (that is, how likely the event will occur). So, it is natural for us to assign probability to an event based on our own assessment on the "chance". (In order for the probability to be valid according to the axiomatic definition, the assignment needs to satisfy the probability axioms.) But different people may have different assessment on the "chance", depending on their personal opinions. So, we can see that such interpretation of probability is somewhat subjective, since different people may assign different probabilities to the same event. Hence, we call such probability interpretation as subjectivism (also known as Bayesian probability).

Example. Amy and Bob assign the probabilities of winning the top prize from a lucky draw according to subjectivism:

• Amy thinks that she is lucky, and thus assign 0.7 to the probability of winning the top prize.
• Bob thinks that he is unlucky, and thus assign 0.1 to the probability of winning the top prize.

The main issue of the subjectivism is the lack of objectivity, since different probabilities can be assigned to the same event based on personal opinion. Then, we may have difficulties in choosing which of the probabilities should be used for that event. To mitigate the issue of the lack of objectivity, we may adjust our degrees of belief on an event from time to time when there are more observed data through Bayes' theorem, which will be discussed in later chapter, so that the value is assigned in a more objective way. However, even after the adjustment, the assignment of value is still not in an entirely objective way, since the adjusted value (known as posterior probability) still depends on the initial value (known as prior probability), which is assigned subjectively.

### Frequentism

Another probability interpretation, which is objective, is called frequentism. We denote by ${\displaystyle n(E)}$ the number of occurrences of an event ${\displaystyle E}$ in ${\displaystyle n}$ repetitions of experiment. (An experiment is any action or process with an outcome that is subject to uncertainty or randomness.) Then, we call ${\displaystyle {\frac {n(E)}{n}}}$ as the relative frequency of the event ${\displaystyle E}$. Intuitively, we will expect that the relative frequency fluctuates less and less as ${\displaystyle n}$ gets larger and larger, and approach to a constant limiting value (we call this as limiting relative frequency) as ${\displaystyle n}$ tends to infinity, i.e., the limiting relative frequency is ${\displaystyle \lim _{n\to \infty }{\frac {n(E)}{n}}}$. It is thus natural to take the limiting relative frequency as the probability of the event ${\displaystyle E}$. This is exactly what the definition of probability in the frequentism. In particular, the existence of such limiting relative frequency is an assumption or axiom in frequentism. (As a side result, when ${\displaystyle n}$ is large enough, the relative frequency of the event ${\displaystyle E}$ may be used to approximate the probability of the event ${\displaystyle E}$.)

However, an issue of frequentism is that it may be infeasible to conduct experiments many times for some events. Hence, for those events, no probability can be assigned to them, and this is clearly a limitation for frequentism.

Example. Suppose John is taking a course in probability theory, and the course can only be taken once. Consider the event "John passes the course". For this event, it is infeasible to repeat the experiment many times, since John can only take the course once, so there can be only one experiment. (Notice that we cannot use the experiences for other students taking the course as the experiments for this event, since we specify that the person taking the course must be John, but not others.)

Example. Consider a couple, Amy and Bob, and the event "this couple divorce eventually". For this event, it is again infeasible to repeat the experiment many times, since there is only one such pair of couples, and thus there can only be one experiment.

Because of these issues, we will instead use a modern axiomatic and abstract approach to define probability, which is suggested by a Russian mathematician named Andrey Nikolaevich Kolmogorov in 1933. By axiomatic approach, we mean defining probability quite broadly and abstractly as something that satisfy certain axioms (called probability axioms). Such probability axioms are the mathematical foundation and the basis of modern probability theory.

## Probability axioms

Since we want use the probability measure ${\displaystyle \mathbb {P} }$ to assign probability ${\displaystyle \mathbb {P} (E)}$ to every event ${\displaystyle E}$ in the sample space, it seems natural for us to set domain of the probability measure ${\displaystyle \mathbb {P} }$ to be the set containing all subsets of ${\displaystyle \Omega }$, i.e., the power set of ${\displaystyle \Omega }$, ${\displaystyle {\mathcal {P}}(\Omega )}$. Unfortunately, this situation is not that simple, and there are some technical difficulties if we set the domain like this, when the sample space ${\displaystyle \Omega }$ is uncountable.

Remark.

• A set is countable, if, intuitively and informally, we can "count" the elements of the set. E.g., every finite set is countable since we can "count" the elements in every finite set one by one. Also, ${\displaystyle \mathbb {N} =\{1,2,\dotsc \},\mathbb {Z} =\{0,1,-1,2,-2,\dotsc \}}$ are countable since we can still "count" the elements in these sets one by one. (A countable infinite set is also called a countably infinite set.)
• A set is uncountable if it is not countable.
• Some examples of uncountable sets include ${\displaystyle \mathbb {R} }$ and real intervals ${\displaystyle (a,b),(a,b],[a,b),[a,b]}$ (${\displaystyle b>a}$).

This is because the power set of such uncountable sample space includes some "badly behaved" sets, which causes problems when assigning probabilities to them. (Here, we will not discuss those sets and these technical difficulties in details.) Thus, instead of setting the domain of the probability measure to be ${\displaystyle {\mathcal {P}}(\Omega )}$, we set the domain to be a ${\displaystyle \sigma }$-algebra (sigma-algebra) containing some "sufficiently well-behaved" events:

Definition. (${\displaystyle \sigma }$-algebra) Let ${\displaystyle S}$ be a set (considered to be the universal set), and ${\displaystyle {\mathcal {P}}(S)}$ be its power set. A set ${\displaystyle \Sigma \subseteq {\mathcal {P}}(S)}$ is a ${\displaystyle \sigma }$-algebra on the set ${\displaystyle S}$, if the following three properties are satisfied:

1. ${\displaystyle S\in \Sigma }$.
2. For every set ${\displaystyle A}$, if the set ${\displaystyle A\in \Sigma }$, then its complement ${\displaystyle A^{c}\in \Sigma }$. (closure under complementation)
3. For every infinite sequence of sets ${\displaystyle A_{1},A_{2},\dotsc }$, if the sets ${\displaystyle A_{1},A_{2},\dotsc \in \Sigma }$, then ${\displaystyle \bigcup _{i=1}^{\infty }A_{i}\in \Sigma }$. (closure under countable unions)

Remark.

• Elements of the ${\displaystyle \sigma }$-algebra are sometimes called measurable sets. Hence, whether a set is measurable or not actually depends on how we construct the ${\displaystyle \sigma }$-algebra. However, we usually construct ${\displaystyle \sigma }$-algebra to include "well-behaved" sets only, and exclude "badly behaved" sets. So, in this case, only "well-behaved" sets are measurable.
• A ${\displaystyle \sigma }$-algebra on the sample space ${\displaystyle \Omega }$, which consists of a family (or set) of events, is also called an event space, denoted by ${\displaystyle {\mathcal {F}}}$.
• We call ${\displaystyle (S,\Sigma )}$ as a measurable space. In the context of probability, the measurable space is ${\displaystyle (\Omega ,{\mathcal {F}})}$.
• When the property 3 is weakened to the closure under finite unions, that is, "For every finite sequence ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}}$, if the sets ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}\in \Sigma }$, then ${\displaystyle \bigcup _{i=1}^{n}A_{i}\in \Sigma }$.", then we call ${\displaystyle \Sigma }$ as an algebra over the set ${\displaystyle S}$. As we will see, the property 3 above implies this weakened property. Thus, a ${\displaystyle \sigma }$-algebra on the set ${\displaystyle S}$ is necessarily an algebra over the set ${\displaystyle S}$ (but not vice versa).
• In terms of probabilities, an intuitive meaning of each of the above three properties is
• property 1: we should be able assign the probability to the entire sample space (this comes from the probability axiom actually);
• property 2: if we are able to assign probability to an event ("chance" of the occurrence of the event), then we should also be able to assign a probability to its complement ("chance" of the non-occurrence of the event);
• property 3: if we are able to assign probabilities to the events ${\displaystyle E_{1},E_{2},\dotsc }$, then we should be able to assign a probability to their union ("chance" of the occurrence of one of the events).
• Actually, from these three properties, we can deduce some similar results for finitely many events and intersections.

Proposition. (Further properties of ${\displaystyle \sigma }$-algebra) Let ${\displaystyle \Sigma }$ be a ${\displaystyle \sigma }$-algebra of a set ${\displaystyle S}$. Then, we have

1. ${\displaystyle \varnothing \in \Sigma }$.
2. For every set ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}}$, if the sets ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}\in \Sigma }$, then ${\displaystyle \bigcup _{i=1}^{n}A_{i}\in \Sigma }$. (closure under finite unions)
3. For every infinite sequence of sets ${\displaystyle A_{1},A_{2},\dotsc }$, if the sets ${\displaystyle A_{1},A_{2},\dotsc \in \Sigma }$, then ${\displaystyle \bigcap _{i=1}^{\infty }A_{i}\in \Sigma }$. (closure under countable intersections)
4. For every set ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}}$, if the sets ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}\in \Sigma }$, then ${\displaystyle \bigcap _{i=1}^{n}A_{i}\in \Sigma }$. (closure under finite intersections)

Proof.

Property 1: By the closure under complementation, since ${\displaystyle S\in \Sigma }$, it follows that ${\displaystyle \varnothing =S^{c}\in \Sigma }$.

Property 2: By the closure under countable unions, we have for every infinite sequence of sets ${\displaystyle A_{1},A_{2},\dotsc }$, if the sets ${\displaystyle A_{1},A_{2},\dotsc \in \Sigma }$, then ${\displaystyle \bigcup _{i=1}^{\infty }A_{i}\in \Sigma }$. So, in particular, we can choose the sequence to be ${\displaystyle A_{1},A_{2},\dotsc ,A_{n},\varnothing ,\varnothing ,\dotsc }$ (${\displaystyle \varnothing \in \Sigma }$) where ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}}$ is an arbitrary sequence such that ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}\in \Sigma }$. Then, ${\displaystyle \bigcup _{i=1}^{n}A_{i}=\bigcup _{i=1}^{\infty }A_{i}\in \Sigma .}$ Thus, we have the desired result.

Property 3: For every infinite sequence of sets ${\displaystyle A_{1},A_{2},\dotsc \in \Sigma }$, by the closure under complementation, we have ${\displaystyle A_{1}^{c},A_{2}^{c},\dotsc \in \Sigma .}$ Then, by the closure under countable unions, we have ${\displaystyle \bigcup _{i=1}^{\infty }A_{i}^{c}\in \Sigma }$. After that, we use the De Morgan's law: ${\displaystyle \left(\bigcap _{i=1}^{\infty }A_{i}\right)^{c}=\bigcup _{i=1}^{\infty }A_{i}^{c}\in \Sigma .}$ Using the closure under complementation property again, we have ${\displaystyle \bigcap _{i=1}^{\infty }A_{i}\in \Sigma }$ as desired.

Property 4: The proof is similar to that of property 2, and hence left as an exercise.

${\displaystyle \Box }$

Remark.

• From these properties, it follows that (finite/countably infinite) (unions/intersections) of (complements of) (sets in ${\displaystyle \sigma }$-algebra) is also in ${\displaystyle \sigma }$-algebra. So, even after many operations on sets in ${\displaystyle \sigma }$-algebra, the resulting set is also in ${\displaystyle \sigma }$-algebra.

Exercise. Prove the property 4 in the proposition above.

Proof

Proof. By the closure under countable intersections (property 3 in proposition above), we have for every infinite sequence of sets ${\displaystyle A_{1},A_{2},\dotsc }$, if the sets ${\displaystyle A_{1},A_{2},\dotsc \in \Sigma }$, then ${\displaystyle \bigcap _{i=1}^{\infty }A_{i}\in \Sigma }$. So, in particular, we can choose the sequence to be ${\displaystyle A_{1},A_{2},\dotsc ,A_{n},\varnothing ,\varnothing ,\dotsc }$ (${\displaystyle \varnothing \in \Sigma }$) where ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}}$ is an arbitrary sequence such that ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}\in \Sigma }$. Then, ${\displaystyle \bigcap _{i=1}^{n}A_{i}=\bigcap _{i=1}^{\infty }A_{i}\in \Sigma .}$ Thus, we have the desired result.

${\displaystyle \Box }$

Example. (Smallest and largest ${\displaystyle \sigma }$-algebra) Let ${\displaystyle \Omega }$ be a sample space.

(a) Prove that the event space ${\displaystyle {\mathcal {F}}=\{\varnothing ,\Omega \}}$ (called trivial ${\displaystyle \sigma }$-algebra) is a ${\displaystyle \sigma }$-algebra on ${\displaystyle \Omega }$. This is the smallest ${\displaystyle \sigma }$-algebra on ${\displaystyle \Omega }$. (To be more precise, ${\displaystyle {\mathcal {F}}\subseteq \Sigma }$ for every ${\displaystyle \sigma }$-algebra ${\displaystyle \Sigma }$ on ${\displaystyle \Omega }$.)

(b) Prove that the event space ${\displaystyle {\mathcal {F}}={\mathcal {P}}(\Omega )}$ is a ${\displaystyle \sigma }$-algebra on ${\displaystyle \Omega }$. This is the largest ${\displaystyle \sigma }$-algebra on ${\displaystyle \Omega }$. (To be more precise, ${\displaystyle {\mathcal {F}}\supseteq \Sigma }$ for every ${\displaystyle \sigma }$-algebra ${\displaystyle \Sigma }$ on ${\displaystyle \Omega }$.)

Solution.

(a)

Proof. To prove that ${\displaystyle {\mathcal {F}}}$ is a ${\displaystyle \sigma }$-algebra, we need to show that ${\displaystyle {\mathcal {F}}}$ satisfies the three defining properties.

Property 1: Since ${\displaystyle \Omega \in {\mathcal {F}}}$, the property 1 is satisfied.

Property 2 (closure under complementation): Since ${\displaystyle {\mathcal {F}}}$ contains only ${\displaystyle \varnothing }$ and ${\displaystyle \Omega }$, it suffices to show that the complement of each of them also belongs to ${\displaystyle {\mathcal {F}}}$: ${\displaystyle \varnothing ^{c}=\Omega \in {\mathcal {F}}}$ and ${\displaystyle \Omega ^{c}=\varnothing \in {\mathcal {F}}}$. So, the property 2 is satisfied.

Property 3 (closure under countable unions): Since ${\displaystyle {\mathcal {F}}}$ contains only ${\displaystyle \varnothing }$ and ${\displaystyle \Omega }$, the union of countably infinitely many events ("${\displaystyle A_{1},A_{2},\dotsc }$") in ${\displaystyle {\mathcal {F}}}$ (each of them is either ${\displaystyle \varnothing }$ or ${\displaystyle \Omega }$) is either ${\displaystyle \varnothing }$ or ${\displaystyle \Omega }$. In either case, the union belongs to ${\displaystyle {\mathcal {F}}}$. Thus, the property 3 is satisfied.

${\displaystyle \Box }$

(b)

Proof.

Property 1: Since ${\displaystyle \Omega \subseteq \Omega }$, it follows that ${\displaystyle \Omega \in {\mathcal {P}}(\Omega )}$. Thus, the property is satisfied.

Property 2: Assume that ${\displaystyle A\in {\mathcal {F}}={\mathcal {P}}(\Omega )}$. Then, ${\displaystyle A\subseteq \Omega }$. By the definition of complement, we have ${\displaystyle A^{c}=\Omega \setminus A\subseteq \Omega }$. Hence, ${\displaystyle A^{c}\in {\mathcal {F}}}$, and thus the property is satisfied.

Property 3: For every infinite sequence of sets ${\displaystyle A_{1},A_{2},\dotsc \in \Sigma }$, we have ${\displaystyle A_{1},A_{2},\dotsc \subseteq \Omega }$. It follows by the property of union that ${\displaystyle \bigcup _{i=1}^{\infty }A_{i}\subseteq \Omega }$, and hence ${\displaystyle \bigcup _{i=1}^{\infty }A_{i}\in {\mathcal {F}}}$. Thus, the property is satisfied.

${\displaystyle \Box }$

Exercise.

1 Let the sample space be ${\displaystyle \Omega =\{1,2,3,4,5,6\}}$. Which of the following is/are ${\displaystyle \sigma }$-algebra of ${\displaystyle \Omega }$?

 ${\displaystyle \Omega }$ ${\displaystyle \{\Omega \}}$ ${\displaystyle \{\{1\},\{2\},\{3\},\{4\},\{5\},\{6\}\}}$ ${\displaystyle \{\varnothing ,\{1,2\},\{3,4,5,6\},\Omega \}}$ ${\displaystyle \{\{1,2,3\},\{4,5,6\},\Omega \}}$

2 Let the sample space be ${\displaystyle \Omega =\mathbb {R} }$. Which of the following is/are ${\displaystyle \sigma }$-algebra of ${\displaystyle \Omega }$?

 ${\displaystyle \{\varnothing ,(-\infty ,0],(0,\infty ),\mathbb {R} \}}$ ${\displaystyle \{\varnothing ,(-\infty ,0),\{0\},(0,\infty ),\mathbb {R} \}}$ ${\displaystyle \{\varnothing ,\mathbb {Q} ,\mathbb {I} ,\mathbb {R} \}}$ (where ${\displaystyle \mathbb {Q} }$ and ${\displaystyle \mathbb {I} }$ are sets of all rational numbers and irrational numbers respectively) ${\displaystyle {\mathcal {P}}(\mathbb {R} )}$

We have seen two examples of ${\displaystyle \sigma }$-algebra in the example above. Often, the "smallest" ${\displaystyle \sigma }$-algebra is not chosen to be the domain of the probability measure, since we usually are interested in events other than ${\displaystyle \varnothing }$ and ${\displaystyle \Omega }$.

For the "largest" ${\displaystyle \sigma }$-algebra, on the other hand, it contains every event, but we may not be interested in some of them. Particularly, we are usually interested in events that are "well-behaved", instead of those "badly behaved" events (indeed, it may be even impossible to assign probabilities to them properly (those events are called nonmeasurable)).

Fortunately, when the sample space ${\displaystyle \Omega }$ is countable, every set in ${\displaystyle {\mathcal {P}}(\Omega )}$ is "well-behaved", so we can take this power set to be a ${\displaystyle \sigma }$-algebra for the domain of probability measure.

However, when the sample space ${\displaystyle \Omega }$ is uncountable, even if the power set ${\displaystyle {\mathcal {P}}(\Omega )}$ is a ${\displaystyle \sigma }$-algebra, it contains "too many" events, particularly, it even includes some "badly behaved" events. Therefore, we will not choose such power set to the domain of the probability measure. Instead, we just choose a ${\displaystyle \sigma }$-algebra that includes the "well-behaved" events to be the domain, so that we are able to assign probability properly to every event in the ${\displaystyle \sigma }$-algebra of the domain. Particularly, those "well-behaved" events are often the events of interest, so all events of interest are contained in that ${\displaystyle \sigma }$-algebra, that is, the domain of the probability measure.

To motivate the probability axioms, we consider some properties that the "probability" in frequentism (as a limiting relative frequency) possess:

1. The limiting relative frequency must be nonnegative. (We call this property as nonnegativity.)
2. The limiting relative frequency of the whole sample space ${\displaystyle \Omega }$ (${\displaystyle \Omega }$ is also an event) must be 1 (since by definition ${\displaystyle \Omega }$ contains all sample points, this event must occur in every repetition). (We call this property as unitarity.)
3. If the events ${\displaystyle E_{1},E_{2},\dotsc }$ are pairwise disjoint (i.e., ${\displaystyle E_{i}\cap E_{j}=\varnothing }$ for every ${\displaystyle i,j}$ with ${\displaystyle i\neq j}$), then the limiting relative frequency of the event ${\displaystyle \bigcup _{i=1}^{\infty }E_{i}{\overset {\text{ def }}{=}}E_{1}\cup E_{2}\cup \dotsb }$ (union of subsets of ${\displaystyle \Omega }$ is a subset of ${\displaystyle \Omega }$, so it can be called an event) is

{\displaystyle {\begin{aligned}\lim _{n\to \infty }{\frac {n(\bigcup _{i=1}^{\infty }E_{i})}{n}}&=\lim _{n\to \infty }{\frac {n(\bigcup _{i=1}^{\infty }E_{i})}{n}}\\&=\lim _{n\to \infty }{\frac {n(E_{1})+n(E_{2})+\dotsb }{n}}&({\text{the events are pairwise disjoint}})\\&=\lim _{n\to \infty }{\frac {n(E_{1})}{n}}+\lim _{n\to \infty }{\frac {n(E_{2})}{n}}+\dotsb &({\text{every limit exists by the axiom in frequentism}})\\&=\sum _{i=1}^{\infty }\lim _{n\to \infty }{\frac {n(E_{i})}{n}},\end{aligned}}}

which is the sum of the limiting relative frequency of each of the events ${\displaystyle E_{1},E_{2},\dotsc }$. (We call this property as countable additivity.)

It is thus very natural to set the probability axioms to be the three properties mentioned above:

Definition. (Probability measure)

Let ${\displaystyle \Omega }$ be the sample space of a random experiment, and ${\displaystyle {\mathcal {F}}}$ be an associated event space (which is a ${\displaystyle \sigma }$-algebra on ${\displaystyle \Omega }$). A probability measure is a function ${\displaystyle \mathbb {P} :{\mathcal {F}}\to [0,1]}$, with the following probability axioms satisified:

(P1) for every event ${\displaystyle E\in {\mathcal {F}}}$, the probability of the event ${\displaystyle E}$, ${\displaystyle \mathbb {P} (E)\geq 0}$; (nonnegativity)
(P2) ${\displaystyle \mathbb {P} (\Omega )=1}$; (unitarity)
(P3) for every infinite sequence of pairwise disjoint events ${\displaystyle E_{1},E_{2},\dotsc }$ (each of which is an element of ${\displaystyle {\mathcal {F}}}$), ${\displaystyle \mathbb {P} \left(\bigcup _{i=1}^{\infty }E_{i}\right)=\sum _{i=1}^{\infty }\mathbb {P} (E_{i})}$. (countable additivity)

Remark.

• For the axiom P3, we consider infinite sequences instead of finite sequences to add generality for infinite sample spaces. As we will see, we have result similar to the axiom P3 for finitely many pairwise disjoint events.
• When ${\displaystyle \Omega }$ is countable, we just set the associated event space to be ${\displaystyle {\mathcal {F}}={\mathcal {P}}(\Omega )}$.
• When ${\displaystyle \Omega }$ is uncountable, the event space is usually not set to be ${\displaystyle {\mathcal {P}}(\Omega )}$. Instead, it is common to set the event space to be the Borel ${\displaystyle \sigma }$-algebra on ${\displaystyle \Omega }$ (which is defined when ${\displaystyle \Omega }$ is a topological space). The details are omitted here, but the event space contains all "well-behaved" sets and most sets we can think of in the power set ${\displaystyle {\mathcal {P}}(\Omega )}$ anyway.
• The probability axiom P3 relates the events ${\displaystyle E_{1},E_{2},\dotsc }$ to their union ${\displaystyle \bigcup _{i=1}^{\infty }E_{i}}$. This matches with the closure under countable unions in the definition of ${\displaystyle \sigma }$-algebra.

Using the probability axioms alone, we can prove many well-known properties of probability.

### Basic properties of probability

Let us start the discussion with some simple properties of probability.

Theorem. Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space and ${\displaystyle A}$ be a set in the event space ${\displaystyle {\mathcal {F}}}$. Then, ${\displaystyle \mathbb {P} (\varnothing )=0}$.

Proof. Consider the infinite sequence of events ${\displaystyle \Omega ,\varnothing ,\varnothing ,\dotsc }$ (recall that ${\displaystyle \varnothing }$ and ${\displaystyle \Omega }$ must be in the ${\displaystyle \sigma }$-algebra ${\displaystyle {\mathcal {F}}}$). We can see that the events are pairwise disjoint. Also, the union of these events is ${\displaystyle \Omega }$. Hence, by the countable additivity of probability, we have ${\displaystyle \underbrace {\mathbb {P} (\Omega )} _{=1}=\underbrace {\mathbb {P} (\Omega )} _{=1}+\mathbb {P} (\varnothing )+\mathbb {P} (\varnothing )+\dotsb \implies \mathbb {P} (\varnothing )+\mathbb {P} (\varnothing )+\dotsb =1-1=0\implies \sum _{i=1}^{\infty }\mathbb {P} (\varnothing )=0.}$ It can then be shown that ${\displaystyle \mathbb {P} (\varnothing )=0}$. [1]

${\displaystyle \Box }$

Using this result, we can obtain finite additivity from the countable additivity of probability:

Theorem. (Finite additivity) Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space. Then, for every event ${\displaystyle E_{1},E_{2},\dotsc ,E_{n}\in {\mathcal {F}}}$, if the events ${\displaystyle E_{1},E_{2},\dotsc ,E_{n}}$ are pairwise disjoint, then ${\displaystyle \mathbb {P} \left(\bigcup _{i=1}^{n}E_{i}\right)=\sum _{i=1}^{n}\mathbb {P} (E_{i}).}$

Proof. Consider the infinite sequence of events ${\displaystyle E_{1},E_{2},\dotsc ,E_{n},\varnothing ,\varnothing ,\dotsc }$ (recall that ${\displaystyle \varnothing \in {\mathcal {F}}}$ always). Then, {\displaystyle {\begin{aligned}\mathbb {P} \left(\bigcup _{i=1}^{n}A_{i}\right)&=\mathbb {P} \left(\bigcup _{i=1}^{\infty }A_{i}\right)\\&=\sum _{i=1}^{\infty }\mathbb {P} (A_{i})&({\text{countable additivity}})\\&=\sum _{i=1}^{n}\mathbb {P} (A_{i})+\sum _{i=n+1}^{\infty }\mathbb {P} (\varnothing )\\&=\sum _{i=1}^{n}\mathbb {P} (A_{i}).\end{aligned}}} (The last equality follows since ${\displaystyle \mathbb {P} (\varnothing )=0}$, and it can be shown that ${\displaystyle \sum _{i=n+1}^{\infty }\mathbb {P} (\varnothing )=0}$ using some concepts in limit (to be mathematically rigorous).)

${\displaystyle \Box }$

Finite additivity makes the proofs of some of the following results simpler.

Theorem. Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space, and ${\displaystyle A}$ and ${\displaystyle B}$ be sets in the event space ${\displaystyle {\mathcal {F}}}$. Then,

1. (complementary event) ${\displaystyle \mathbb {P} (A^{c})=1-\mathbb {P} (A)}$. (The complementary event is taken with respect to the universal set.)
2. (numeric bound) ${\displaystyle 0\leq \mathbb {P} (A)\leq 1}$.
3. ${\displaystyle \mathbb {P} (B)=\mathbb {P} (B\cap A)+\mathbb {P} (B\setminus A)}$.
4. ${\displaystyle \mathbb {P} (A\cup B)=\mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B)}$.
5. (monotonicity) If ${\displaystyle A\subseteq B}$, then ${\displaystyle \mathbb {P} (A)\leq \mathbb {P} (B)}$.

Proof.

Property 1:

First, notice that by definition ${\displaystyle \Omega =A\cup A^{c}}$. Furthermore, since ${\displaystyle A\in {\mathcal {F}}}$, we have ${\displaystyle A^{c}\in {\mathcal {F}}}$ by the closure under complementation of ${\displaystyle \sigma }$-algebra. Also, the sets ${\displaystyle A}$ and ${\displaystyle A^{c}}$ are disjoint. Thus, by the finite additivity, we have ${\displaystyle \mathbb {P} (A\cup A^{c})=\mathbb {P} (A)+\mathbb {P} (A^{c}).}$ On the other hand, ${\displaystyle \mathbb {P} (A\cup A^{c})=\mathbb {P} (\Omega ){\overset {\text{ P2 }}{=}}1.}$ Thus, we have the desired result.

Property 2: By property 1, we have ${\displaystyle \mathbb {P} (A)=1-\underbrace {\mathbb {P} (A^{c})} _{\geq 0{\text{ by P1}}}\leq 1}$. We then have the desired numeric bound on ${\displaystyle \mathbb {P} (A)}$ since ${\displaystyle \mathbb {P} (A)\geq 0}$ also by the nonnegativity of probability.

Property 3: {\displaystyle {\begin{aligned}\mathbb {P} (B)&=\mathbb {P} (B\cap \Omega )&(B\subseteq \Omega ,{\text{ so }}B=B\cap \Omega )\\&=\mathbb {P} {\big (}B\cap (A\cup A^{c}){\big )}&({\text{definition}})\\&=\mathbb {P} {\big (}(B\cap A)\cup (B\cap A^{c}){\big )}&({\text{distributive law}})\\&=\mathbb {P} (B\cap A)+\mathbb {P} (B\setminus A)&(B\cap A,B\cap A^{c}\in {\mathcal {F}},{\text{ and are disjoint. Also, }}B\setminus A=B\cap A^{c})\end{aligned}}}

Property 4: By property 3, we have {\displaystyle {\begin{aligned}\mathbb {P} (A\cup B)&=\mathbb {P} {\big (}(A\cup B)\cap A{\big )}+\mathbb {P} {\big (}(A\cup B)\setminus A{\big )}&({\text{property 3}})\\&=\mathbb {P} (A)+{\color {blue}\mathbb {P} (B\setminus A)}&({\text{possibly through Venn diagram informally}})\\&=\mathbb {P} (A)+{\color {blue}\mathbb {P} (B)-\mathbb {P} (B\cap A)}.&({\text{property 3}})\\\end{aligned}}}

Property 5: Assume that ${\displaystyle A\subseteq B}$. Then, ${\displaystyle A\cap B=A}$. Hence, by property 3, ${\displaystyle \mathbb {P} (B)=\mathbb {P} (B\cap A)+\underbrace {\mathbb {P} (B\setminus A)} _{\geq 0{\text{ by P1}}}=\mathbb {P} (A).}$

${\displaystyle \Box }$

Remark.

• The numeric bound property ensures that the probability measure ${\displaystyle \mathbb {P} }$ is indeed a well-defined function (there is not a probability of event that is out of the codomain ${\displaystyle [0,1]}$).

Example. Let ${\displaystyle C}$ and ${\displaystyle F}$ the event of winning the champion in the competition, and entering the final of the competition respectively. Then, ${\displaystyle C\subseteq F}$ (when we win the champion, then we must enter the final), and so ${\displaystyle \mathbb {P} (C)\leq \mathbb {P} (F)}$ by monotonicity. That is, the probability of winning the champion in a competition is less than or equal to that of entering the final of the competition.

Example. Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space, and ${\displaystyle A,B\in {\mathcal {F}}}$. Show that if ${\displaystyle \mathbb {P} (B)>0}$, then ${\displaystyle 0\leq {\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}\leq 1}$.

Proof. Assume ${\displaystyle \mathbb {P} (B)>0}$. Then, we have ${\displaystyle {\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}\geq 0}$ since ${\displaystyle \mathbb {P} (A\cap B)\geq 0}$ by the nonnegativity of probability. On the other hand, we have ${\displaystyle A\cap B\subseteq B}$. So, by monotonicity, ${\displaystyle \mathbb {P} (A\cap B)\leq \mathbb {P} (B)}$, and thus ${\displaystyle {\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}\leq {\frac {\mathbb {P} (B)}{\mathbb {P} (B)}}\leq 1}$. Combining the inequalities we get yields the desired result.

${\displaystyle \Box }$

Exercise. Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space, and ${\displaystyle A,B\in {\mathcal {F}}}$. Show that if ${\displaystyle \mathbb {P} (B)>0}$ and ${\displaystyle A\subseteq B}$, then ${\displaystyle {\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}=1}$.

Proof

Proof. Assume ${\displaystyle \mathbb {P} (B)>0}$ and ${\displaystyle A\subseteq B}$. Then, ${\displaystyle A\cap B=B}$. Hence, ${\displaystyle {\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}={\frac {\mathbb {P} (B)}{\mathbb {P} (B)}}=1.}$

${\displaystyle \Box }$

Exercise. Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space, and ${\displaystyle A,B\in {\mathcal {F}}}$. Show that ${\displaystyle \mathbb {P} (A\cap B)\geq \mathbb {P} (A)+\mathbb {P} (B)-1}$. (Hint: consider the property ${\displaystyle \mathbb {P} (A\cup B)=\mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B)}$.)

Proof

Proof. By the hint, we have ${\displaystyle \underbrace {\mathbb {P} (A\cup B)} _{\leq 1}=\mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B)\implies \mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B)\leq 1\implies \mathbb {P} (A\cap B)\geq \mathbb {P} (A)+\mathbb {P} (B)-1.}$

${\displaystyle \Box }$

Example. Suppose you roll a loaded dice, and the corresponding probability space is ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$. It is given that ${\displaystyle \mathbb {P} (\{1\})=\mathbb {P} (\{3\})=\mathbb {P} (\{5\})}$, and ${\displaystyle \mathbb {P} (\{2\})=\mathbb {P} (\{4\})=\mathbb {P} (\{6\})}$, and ${\displaystyle \mathbb {P} (\{{\text{odd integer}}\})=2\mathbb {P} (\{{\text{even integer}}\})}$. Calculate the probability of getting a 1 from the dice, i.e., ${\displaystyle \mathbb {P} (\{1\})}$.

Solution. First, notice that ${\displaystyle \{{\text{odd integer}}\}=\{1,3,5\}}$ and ${\displaystyle \{{\text{even integer}}\}=\{2,4,6\}}$. Furthermore, ${\displaystyle \{1,3,5\}\cup \{2,4,6\}=\Omega }$. Also, ${\displaystyle \{1,3,5\}}$ and ${\displaystyle \{2,4,6\}}$ are disjoint. Hence, ${\displaystyle \mathbb {P} (\{1,3,5\})+\mathbb {P} (\{2,4,6\})=1\implies \mathbb {P} (\{1,3,5\})+{\frac {1}{2}}\mathbb {P} (\{1,3,5\})=1\implies \mathbb {P} (\{1,3,5\})={\frac {2}{3}}.}$

Also, ${\displaystyle \{1\},\{3\},\{5\}}$ are disjoint, and hence ${\displaystyle \mathbb {P} (\{1,3,5\})=\mathbb {P} (\{1\})+\mathbb {P} (\{3\})+\mathbb {P} (\{5\})=3\mathbb {P} (\{1\})\implies \mathbb {P} (\{1\})={\frac {2}{9}}.}$

Exercise. Calculate the probability of getting a 2 from the dice.

Solution

From the example above, we have ${\displaystyle \mathbb {P} (\{2,4,6\})={\frac {1}{2}}\mathbb {P} (\{1,3,5\})={\frac {1}{3}}}$. Since ${\displaystyle \{2\},\{4\},\{6\}}$ are disjoint, it follows that ${\displaystyle \mathbb {P} (\{2,4,6\})=\mathbb {P} (\{2\})+\mathbb {P} (\{4\})+\mathbb {P} (\{6\})=3\mathbb {P} (\{2\})\implies \mathbb {P} (\{2\})={\frac {1}{9}}.}$

Remark.

• For simplicity, we may sometimes omit the curly braces ${\displaystyle \{\}}$ inside the probability measure. For example, we may write ${\displaystyle \mathbb {P} ({\text{odd integer}})}$ instead of ${\displaystyle \mathbb {P} (\{{\text{odd integer}}\})}$.

Example. Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space. Suppose we have ${\displaystyle \mathbb {P} (A)=0.6}$ and ${\displaystyle \mathbb {P} (B^{c})=0.55}$ for events ${\displaystyle A,B\in {\mathcal {F}}}$. Can the events ${\displaystyle A}$ and ${\displaystyle B}$ be disjoint?

Solution. They cannot be disjoint. We may prove it by contradiction.

Proof. Assume to the contrary that ${\displaystyle A}$ and ${\displaystyle B}$ are disjoint. First, ${\displaystyle \mathbb {P} (B)=1-\mathbb {P} (B^{c})=0.45}$. Then, by the finite additivity of probability, we have ${\displaystyle \mathbb {P} (A\cup B)=\mathbb {P} (A)+\mathbb {P} (B)=0.6+0.45=1.05>1,}$ causing a contradiction (violating the numeric bound property, and hence violating the probability axioms logically (by contrapositive)).

${\displaystyle \Box }$

Exercise. What is the minimum probability of ${\displaystyle \mathbb {P} (A\cap B)}$ such that ${\displaystyle \mathbb {P} }$ can possibly be a valid probability measure? (Hint: it suffices to consider the bound on ${\displaystyle \mathbb {P} (A\cap B)}$ if ${\displaystyle \mathbb {P} }$ is a valid probability measure. The bound is a necessary condition (but may not be sufficient) for ${\displaystyle \mathbb {P} }$ to be a valid probability measure in logic terms.)

Solution

When ${\displaystyle \mathbb {P} }$ is a valid probability measure, we have ${\displaystyle \mathbb {P} (A\cup B)=\mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B)=1.05-\mathbb {P} (A\cap B).}$ Since ${\displaystyle \mathbb {P} }$ is a valid probability measure, it follows that ${\displaystyle \mathbb {P} (A\cup B)\leq 1\implies \mathbb {P} (A\cap B)\geq 0.05}$, meaning that the minimum probability is 0.05.

Exercise.

Define ${\displaystyle A\triangle B}$ be the symmetric difference of sets ${\displaystyle A}$ and ${\displaystyle B}$, that is, ${\displaystyle (A\setminus B)\cup (B\setminus A)}$.

Propose a formula for ${\displaystyle \mathbb {P} (A\triangle B)}$ in terms of ${\displaystyle \mathbb {P} (A),\mathbb {P} (B),\mathbb {P} (A\cap B)}$, and prove it. (Note: the occurrence of the event ${\displaystyle A\triangle B}$ corresponds to the occurrence of exactly one of the events ${\displaystyle A}$ and ${\displaystyle B}$.) (Hint: You may use without proof the fact that ${\displaystyle (A\setminus B)\cap (B\setminus A)=\varnothing }$. Also, you may use the property that ${\displaystyle \mathbb {P} (A\setminus B)=\mathbb {P} (A)-\mathbb {P} (A\cap B)}$.)

Solution

Proposition: ${\displaystyle \mathbb {P} (A\triangle B)=\mathbb {P} (A)+\mathbb {P} (B)-2\mathbb {P} (A\cap B)}$.

Proof. We have ${\displaystyle \mathbb {P} (A\triangle B)=\mathbb {P} (A\setminus B)+\mathbb {P} (B\setminus A)-\mathbb {P} ((A\setminus B)\cap (B\setminus A))=\mathbb {P} (A)-\mathbb {P} (A\cap B)+\mathbb {P} (B)-\mathbb {P} (B\cap A)-\mathbb {P} (\varnothing )=\mathbb {P} (A)+\mathbb {P} (B)-2\mathbb {P} (A\cap B).}$

${\displaystyle \Box }$

Example. Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space, and ${\displaystyle A,B,C\in {\mathcal {F}}}$ be events such that ${\displaystyle A\cup B\cup C=\Omega }$. It is given that ${\displaystyle \mathbb {P} (A)=0.5,\quad \mathbb {P} (B)=0.5,\quad \mathbb {P} (C)=0.6,\quad \mathbb {P} (A\cap B)=0.2,\quad \mathbb {P} (B\cap C)=0.3,\quad \mathbb {P} (A\cap C)=0.2}$ Calculate

(a) ${\displaystyle \mathbb {P} (A\cup B),\mathbb {P} (B\cup C),\mathbb {P} (A\cup C)}$;

(b) ${\displaystyle \mathbb {P} (A\cup B\cup C),\mathbb {P} (A\cap B\cap C)}$ (without using inclusion-exclusion principle for three events directly).

Solution.

(a)

• ${\displaystyle \mathbb {P} (A\cup B)=\mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B)=0.5+0.5-0.2=0.8}$.
• ${\displaystyle \mathbb {P} (B\cup C)=\mathbb {P} (B)+\mathbb {P} (C)-\mathbb {P} (B\cap C)=0.5+0.6-0.3=0.8}$.
• ${\displaystyle \mathbb {P} (A\cup C)=\mathbb {P} (A)+\mathbb {P} (C)-\mathbb {P} (A\cap C)=0.5+0.6-0.2=0.9}$.

(b) First, ${\displaystyle \mathbb {P} (A\cup B\cup C)=\mathbb {P} (\Omega )=1}$. Then, we write {\displaystyle {\begin{aligned}\mathbb {P} (A\cup B\cup C)&=\mathbb {P} (A\cup (B\cup C))\\&=\mathbb {P} (A)+\mathbb {P} (B\cup C)-\mathbb {P} (A\cap (B\cup C))\\&=0.5+0.8-\mathbb {P} ((A\cap B)\cup (A\cap C))\\&=1.3-{\big (}\mathbb {P} (A\cap B)+\mathbb {P} (A\cap C)-\mathbb {P} (A\cap B\cap C){\big )}\\&=1.3-0.2-0.2+\mathbb {P} (A\cap B\cap C)\\&=0.9+\mathbb {P} (A\cap B\cap C).\\\end{aligned}}} It follows that ${\displaystyle \mathbb {P} (A\cap B\cap C)=1-0.9=0.1}$.

Note: to keep track of the probabilities of different events more easily, we may draw a Venn diagram:

|-------------| <--------- A
|             |
|        |----|----|
|  0.2   |    |    |
|        | 0.1| 0.2| <---- C
|        |    |    |
|--------|----|----|------|
|        |0.1 | 0.2|      |
|   0.1  |    |    |  0.1 | <---- B
|        |----|----|      |
|-------------|-----------|


It also allows us to calculate the probabilities in a more systematic manner. (E.g., we know that ${\displaystyle \mathbb {P} (A\cap B\cap C)=0.1}$. Also, ${\displaystyle \mathbb {P} (A\cap B)=0.2}$. So, the "remaining piece" in ${\displaystyle A\cap B}$ has 0.1 probability (i.e., ${\displaystyle \mathbb {P} ((A\cap B)\setminus C)=0.2-0.1=0.1}$.)

Exercise. Calculate

(a) ${\displaystyle \mathbb {P} (A\setminus B),\mathbb {P} (B\setminus A),\mathbb {P} (A\setminus C),\mathbb {P} (C\setminus A),\mathbb {P} (B\setminus C),\mathbb {P} (C\setminus B)}$;

(b) ${\displaystyle \mathbb {P} (A\setminus (B\cap C)),\mathbb {P} (A\setminus (B\cup C)),\mathbb {P} ((B\cap C)\setminus A),\mathbb {P} ((B\cup C)\setminus A)}$.

Solution

(a)

• ${\displaystyle \mathbb {P} (A\setminus B)=\mathbb {P} (A)-\mathbb {P} (A\cap B)=0.5-0.2=0.3}$.
• ${\displaystyle \mathbb {P} (B\setminus A)=\mathbb {P} (B)-\mathbb {P} (A\cap B)=0.5-0.2=0.3}$.
• ${\displaystyle \mathbb {P} (A\setminus C)=\mathbb {P} (A)-\mathbb {P} (A\cap C)=0.5-0.2=0.3}$.
• ${\displaystyle \mathbb {P} (C\setminus A)=\mathbb {P} (C)-\mathbb {P} (A\cap C)=0.6-0.2=0.4}$.
• ${\displaystyle \mathbb {P} (B\setminus C)=\mathbb {P} (B)-\mathbb {P} (B\cap C)=0.5-0.3=0.2}$.
• ${\displaystyle \mathbb {P} (C\setminus B)=\mathbb {P} (C)-\mathbb {P} (B\cap C)=0.6-0.3=0.3}$.

(b)

• ${\displaystyle \mathbb {P} (A\setminus (B\cap C))=\mathbb {P} (A)-\mathbb {P} (A\cap B\cap C)=0.5-0.1=0.4}$.
• We have ${\displaystyle \mathbb {P} (A\cap (B\cup C))=\mathbb {P} (A)+\mathbb {P} (B\cup C)-\mathbb {P} (A\cup B\cup C)=0.5+0.8-1=0.3}$. Thus, ${\displaystyle \mathbb {P} (A\setminus (B\cup C))=\mathbb {P} (A)-\mathbb {P} (A\cap (B\cup C))=0.5-0.3=0.2}$.
• ${\displaystyle \mathbb {P} ((B\cap C)\setminus A)=\mathbb {P} (B\cap C)-\mathbb {P} (B\cap C\cap A)=0.3-0.1=0.2}$.
• ${\displaystyle \mathbb {P} ((B\cup C)\setminus A)=\mathbb {P} (B\cup C)-\mathbb {P} ((B\cup C)\cap A)=0.8-0.3=0.5}$.

### Constructing a probability measure

As we have said, the axiomatic definition does not suggest us a way to construct a probability measure. Actually, even for the same experiment, there can be many ways to construct a probability measure that satisfies the above probability axioms if there are not sufficient information provided:

Example. (Coin tossing) Consider a random experiment where we toss a coin (fair or unfair). Suppose we want to define a probability space for this random experiment. Three elements are needed: sample space, event space, and probability measure.

Here, we define the sample space as ${\displaystyle \Omega =\{H,T\}}$ (${\displaystyle H}$ and ${\displaystyle T}$ stands for "heads comes up" and "tails comes up" respectively). Since ${\displaystyle \Omega }$ is finite, the event space is chosen to be its power set, that is, ${\displaystyle {\mathcal {F}}=\{\varnothing ,\{H\},\{T\},\{H,T\}\}}$. It then remains to construct the probability measure ${\displaystyle \mathbb {P} }$, subject to the probability axioms. Since the domain of the probability measure is ${\displaystyle {\mathcal {F}}}$, we need to define the "output" of ${\displaystyle \mathbb {P} }$ (that is, assigning probability) for every event in ${\displaystyle {\mathcal {F}}}$.

For the empty set, by a property of probability, the probability assigned to it has to be zero. Similarly, for the set ${\displaystyle \{H,T\}}$, the probability assigned to it has to be one by the probability axiom P2. But for the remaining two sets ${\displaystyle \{H\},\{T\}}$, there are no properties/axioms of probability that tell us what probability must we assign to each of them. Despite this, the sum of the two probabilities assigned has to be one, by the finite additivity property: ${\displaystyle \mathbb {P} (\{H\})+\mathbb {P} (\{T\})=\mathbb {P} (\{H,T\}){\overset {\text{ P2 }}{=}}1.}$ Also, by the numeric bound of probability, the two probabilities assigned has to be between zero and one.

So, without further information, we can set ${\displaystyle \mathbb {P} (\{H\})=x}$ where ${\displaystyle x\in [0,1]}$ (and thus ${\displaystyle \mathbb {P} (\{T\})=1-x}$). Hence, we can see that there are infinitely many ways to construct a probability measure for this random experiment!

However, we have previously mentioned that we may assign probabilities to events subjectively (as in subjectivism), or according to its limiting relative frequency (as in frequentism). Through these two probability interpretations, we may provide some background information for a random experiment, by assigning probabilities to some of the events before constructing the probability measure, to the extent that there is exactly one way to construct a probability measure. Consider the coin tossing example again:

Example. (Coin tossing, continued) Suppose we are told that the coin is fair, so based on this message, we assign the probability of "head comes up" and "tail comes up" to be ${\displaystyle {\frac {1}{2}}}$ respectively. In this case, the way of constructing the probability measure is fixed: ${\displaystyle \mathbb {P} (\varnothing )=0,\quad \mathbb {P} (\{H\})={\frac {1}{2}},\quad \mathbb {P} (\{T\})={\frac {1}{2}},\quad \mathbb {P} (\{H,T\})=1.}$ This is the only way to construct a probability measure (satisfying the probability axioms) with this further information.

In general, it is not necessary to assign probability to every event in the event space in the background information for us to able to construct the probability measure in exactly one way. Consider the following example.

Example. (Rolling dice) Consider a random experiment where we roll a dice (loaded or unloaded). Then, we define the sample space as ${\displaystyle \Omega =\{1,2,3,4,5,6\}}$ where each number represents the number coming up. It follows that the event space is ${\displaystyle {\mathcal {F}}={\mathcal {P}}(\Omega )}$ since ${\displaystyle \Omega }$ is finite. (Since there are ${\displaystyle 2^{6}=64}$ sets in the event space, we will not list out them.) Now, we proceed to construct a probability measure ${\displaystyle \mathbb {P} }$ (satisfying the probability axioms). Similarly, we need to consider all 64 events in the event space ${\displaystyle {\mathcal {F}}}$.

First, for the empty set and the sample space, we have ${\displaystyle \mathbb {P} (\varnothing )=0}$ and ${\displaystyle \mathbb {P} (\Omega )=1}$. Now, let us consider the singletons (i.e., sets consisting one element) in the event space: ${\displaystyle \{1\},\{2\},\{3\},\{4\},\{5\},\{6\}}$. Similarly, without further information, there are infinitely many possible ways to assign probability to each of them, subject to the constraint that their sum has to be one by the probability axiom P2. For now, let us denote ${\displaystyle \mathbb {P} (\{1\})=p_{1},\quad \mathbb {P} (\{2\})=p_{2},\quad \mathbb {P} (\{3\})=p_{3},\quad \mathbb {P} (\{4\})=p_{4},\quad \mathbb {P} (\{5\})=p_{5},\quad \mathbb {P} (\{6\})=p_{6}.}$

If we think more carefully, then we will find out that once these six probabilities are fixed, the probabilities for all other ${\displaystyle 64-2-6=56}$ events are fixed, based on the property of finite additivity (which must be satisfied when the probability measure satisfies the probability axioms), since all other 56 events are just finite unions of some of these six events, and thus their probabilities can be obtained by adding up some of the above probabilities.

We can see from this example that to provide sufficient background information to the extent that the probability measure can be constructed in exactly one way, we just need the probability of each of the singleton events (which should be nonnegative and sum to one to satisfy the probability axioms). After that, we can calculate the probability for each of the other events in the event space, and hence construct the only possible probability measure.

This is true when the sample space is countable, in general:

Theorem. Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space, where the sample space ${\displaystyle \Omega }$ is countable. Assume that the probability of each of the singleton events (i.e., ${\displaystyle \mathbb {P} (\{\omega \})}$ for every ${\displaystyle \omega \in \Omega }$) is given such that they are nonnegative and their sum is one (to satisfy the probability axioms). Then, the probability measure ${\displaystyle \mathbb {P} :{\mathcal {F}}\to [0,1]}$ is given by ${\displaystyle \mathbb {P} (E)=\sum _{\omega \in E}^{}\mathbb {P} (\{\omega \})}$ for every event ${\displaystyle E\in {\mathcal {F}}}$.

Proof.

Case 1: ${\displaystyle \Omega }$ is finite. Then, we can write ${\displaystyle \Omega =\{\omega _{1},\omega _{2},\dotsc ,\omega _{n}\}}$. It follows that every event ${\displaystyle E\in {\mathcal {F}}}$ can be expressed as ${\displaystyle E=\bigcup _{i}\{\omega _{i}\}}$ (which ${\displaystyle i}$s are taken over for the union depends on the event ${\displaystyle E}$). Notice also that the sets "${\displaystyle \{\omega _{i}\}}$"s are disjoint. (Every set contains a different sample point, and so the intersection of any pair of them is an empty set.) Then, by the finite additivity of probability, we have for every event ${\displaystyle E\in {\mathcal {F}}}$, ${\displaystyle \mathbb {P} (E)=\sum _{i}^{}\mathbb {P} (\{\omega _{i}\})=\sum _{\omega \in E}^{}\mathbb {P} (\{\omega \}).}$

Case 2: ${\displaystyle \Omega }$ is countably infinite. Then, we can write ${\displaystyle \Omega =\{\omega _{1},\omega _{2},\dotsc \}}$. It follows that every event ${\displaystyle E\in {\mathcal {F}}}$ can be expressed as ${\displaystyle E=\bigcup _{i}\{\omega _{i}\}}$ (which ${\displaystyle i}$s are taken over for the union depends on the event ${\displaystyle E}$). Notice also that the sets "${\displaystyle \{\omega _{i}\}}$"s are disjoint. Then, by the countable additivity/finite additivity of probability, we have for every event ${\displaystyle E\in {\mathcal {F}}}$, ${\displaystyle \mathbb {P} (E)=\sum _{i}^{}\mathbb {P} (\{\omega _{i}\})=\sum _{\omega \in E}^{}\mathbb {P} (\{\omega \}).}$

${\displaystyle \Box }$

Example. Consider a random experiment where we draw an integer from all positive integers. Then, the probability space is ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ where ${\displaystyle \Omega =\mathbb {N} ,{\mathcal {F}}={\mathcal {P}}(\mathbb {N} )}$. Suppose it is given that ${\displaystyle \mathbb {P} (\{n\})=2^{-n}}$ for every ${\displaystyle n\in \mathbb {N} }$. Since the probabilities are nonnegative, and also their sum is ${\displaystyle 2^{-1}+2^{-2}+\dotsb ={\frac {1/2}{1-1/2}}=1,}$ it follows that the probability measure is given by ${\displaystyle \mathbb {P} (E)=\sum _{n\in E}^{}\mathbb {P} (\{n\})}$ for every event ${\displaystyle E\in {\mathcal {F}}}$.

Exercise. Calculate the probability ${\displaystyle \mathbb {P} (\{{\text{odd integer}}\})}$.

Solution

We have by above formula ${\displaystyle \mathbb {P} (\{{\text{odd integer}}\})=\mathbb {P} (\{1\})+\mathbb {P} (\{3\})+\dotsb =2^{-1}+2^{-3}+\dotsb ={\frac {2^{-1}}{1-2^{-2}}}={\frac {1/2}{3/4}}={\frac {2}{3}}.}$

The following is an important special case for the above theorem.

Corollary. (Formula for combinatorial probability) Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space, where the sample space is finite. Assume that the probability of each of the singleton events (${\displaystyle |\Omega |}$ of them) is the same: ${\displaystyle {\frac {1}{|\Omega |}}}$. (we say that all outcomes are equally likely in this case). Then, the probability measure ${\displaystyle \mathbb {P} :{\mathcal {F}}\to [0,1]}$, called combinatorial probability (or classical probability) is given by ${\displaystyle \mathbb {P} (E)={\frac {|E|}{|\Omega |}}={\frac {{\text{number of sample points in }}E}{{\text{number of sample points in }}\Omega }}.}$ for every event ${\displaystyle E\in {\mathcal {F}}}$.

Proof. Under the assumptions, the probability of every singleton event is nonnegative. Also, the sum of the probabilities is ${\displaystyle \underbrace {{\frac {1}{|\Omega |}}+{\frac {1}{|\Omega |}}+\dotsb +{\frac {1}{|\Omega |}}} _{|\Omega |{\text{ times}}}=1.}$ Thus, for every event ${\displaystyle E}$, we have by the previous theorem ${\displaystyle \mathbb {P} (E)=\sum _{\omega \in E}^{}\mathbb {P} (\{\omega \})=\underbrace {{\frac {1}{|\Omega |}}+{\frac {1}{|\Omega |}}+\dotsb +{\frac {1}{|\Omega |}}} _{|E|{\text{ times}}}={\frac {|E|}{|\Omega |}}.}$

${\displaystyle \Box }$

Remark.

• By principle of indifference, unless there exists evidence showing that all outcomes are not equally likely, e.g. it is given that a coin is biased, such that it is more likely that head comes up, we should assume that the outcomes are equally likely.

Example. Suppose we roll a fair red dice and a fair blue dice. Then, we can define the sample space be ${\displaystyle \Omega =\{(a,b):a,b=1,2,\dotsc ,{\text{ or }}6\}}$, where ${\displaystyle (a,b)}$ means ${\displaystyle a}$ comes up for red dice, and ${\displaystyle b}$ comes up for blue dice.

Calculate the probability ${\displaystyle \mathbb {P} (\{(1,1)\})}$.

Solution. The number of sample points is ${\displaystyle \underbrace {6} _{\text{red}}\times \underbrace {6} _{\text{blue}}=36}$. Since the outcomes should be equally likely, the desired probability is ${\displaystyle {\frac {1}{36}}}$ (there is only one sample point for this event).

Exercise. Suppose the blue dice is colored red, such that the two dice become indistinguishable.

Then, student A claims that the probability ${\displaystyle \mathbb {P} (\{(1,1)\})={\frac {1}{21}}}$, and he reasons as follows:

Recall that when the two dices become indistinguishable, the number of possible distinct pairs of numbers facing up becomes ${\displaystyle _{6}H_{2}=21}$ (shown in previous chapter about combinatorics). So, there are only 21 sample points in the sample space, and hence the probability is ${\displaystyle {\frac {1}{21}}}$.

On the other hand, student B claims that the probability should still be ${\displaystyle {\frac {1}{36}}}$ and he reasons as follows:

Notice that in this case, the number 1 comes up for both dice correspond to exactly one outcome in the distinguishable case: ${\displaystyle (1,1)}$. Also, changing the color of the dice should not affect the randomness of the experiment. So, the probability in this case should be the same as the probability ${\displaystyle \mathbb {P} (\{(1,1)\})={\frac {1}{36}}}$.

Which of the students is correct? Also, explain why the other student is wrong.

Solution

Student B is correct. Student A is wrong, since in this case, not all outcomes in the experiment are equally likely. Particularly, some of the sample points in this case actually correspond to two original sample points, e.g. "1 comes up for one dice, 2 comes up for another" corresponds to ${\displaystyle (1,2)}$ and ${\displaystyle (2,1)}$.

Example. Consider a random experiment where a fair coin is tossed twice. Calculate the probability of getting heads in a row.

Solution. There are ${\displaystyle 2\times 2=4}$ outcomes in this experiment. They should be equally likely, and hence the probability of getting two heads is ${\displaystyle {\frac {1}{4}}}$.

Exercise. Suppose we toss a fair coin five times.

(a) Calculate the probability ${\displaystyle \mathbb {P} (\{HHHHH\})}$ (${\displaystyle HHHHH}$ means 5 heads in a row).

(b) Calculate the probability ${\displaystyle \mathbb {P} (\{HHHHT\})}$ (${\displaystyle HHHHT}$ means 4 heads in a row, then 1 tails).

(c) Calculate the probability ${\displaystyle \mathbb {P} (\{{\text{4 heads and 1 tails}}\})}$. (Answer: ${\displaystyle {\frac {5}{32}}}$)

(d) Calculate the probability ${\displaystyle \mathbb {P} (\{{\text{3 heads and 2 tails}}\})}$. (Answer: ${\displaystyle {\frac {10}{32}}}$)

Solution

There should be ${\displaystyle 2^{5}=32}$ equally likely outcomes.

(a) The probability is ${\displaystyle {\frac {1}{32}}}$. (The event contains one sample point only.)

(b) The probability is ${\displaystyle {\frac {1}{32}}}$. (The event contains one sample point only.)

(c) The event ${\displaystyle \{{\text{4 heads and 1 tails}}\}=\{HHHHT,HHHTH,HHTHH,HTHHH,THHHH\}}$, containing five sample points. Thus, the probability is ${\displaystyle {\frac {5}{32}}}$.

(d) We can count the number of sample points in the event by regarding the five outcomes from tossing five times as five distinguishable cells, and 2 tails as 2 indistinguishable balls. Then, we place 2 indistinguishable balls into 5 distinguishable cells (the remaining empty cells are for heads). So, the number is ${\displaystyle {\binom {5}{2}}=10}$. Hence, the probability is ${\displaystyle {\frac {10}{32}}}$.

Example. (Capture-mark-recapture) Suppose we are fishing in a lake, containing ${\displaystyle N}$ distinct fishes. First, we catch ${\displaystyle k}$ fishes from the lake (capture), and give them each a marker (mark), and after that place them back into the lake.

Then, we catch fishes from the lake again (recapture), and catch ${\displaystyle n}$ fishes this time. Show that the probability that there is ${\displaystyle r}$ marked fishes in the ${\displaystyle n}$ fishes caught is ${\displaystyle {\frac {{\binom {k}{r}}\times {\binom {N-k}{n-r}}}{\binom {N}{n}}},}$ assuming that the outcomes in the sample space are equally likely.

Proof. We regard the ${\displaystyle N}$ fishes as ${\displaystyle N}$ distinguishable cells, and ${\displaystyle n}$ catches (in the second time) as ${\displaystyle n}$ indistinguishable balls (the order of the catch is not important, we are considering what ${\displaystyle n}$ fishes are caught only). Then, there are ${\displaystyle {\binom {N}{n}}}$ equally likely sample points in the sample space.

It now remains to consider the number of sample points in the event. We now count the number of ways to have ${\displaystyle r}$ marked fishes in the ${\displaystyle n}$ fishes. We regard the ${\displaystyle k}$ marked distinct fishes as ${\displaystyle k}$ distinguishable cells, and ${\displaystyle N-k}$ non-marked fishes as ${\displaystyle N-k}$ distinguishable cells. (These two groups of cells are put separately.)

• Step 1: placing ${\displaystyle r}$ indistinguishable balls (catches) into ${\displaystyle k}$ distinguishable cells. (${\displaystyle {\binom {k}{r}}}$ ways)
• Step 2: placing ${\displaystyle n-r}$ indistinguishable balls (catches) into ${\displaystyle N-k}$ distinguishable cells. (${\displaystyle {\binom {N-k}{n-r}}}$ ways)

Hence, the number of sample points in the event is ${\displaystyle {\binom {k}{r}}\times {\binom {N-k}{n-r}}}$. The result follows.

${\displaystyle \Box }$

Example. There are 9 distinguishable balls in a box, consisting of 3 red balls, 2 blue balls and 4 green balls.

(a) Calculate the probability that a red ball is drawn from the box if 1 ball is drawn from the box.

(b) Calculate the probability that exactly 2 red balls and exactly 3 green balls are drawn from the box if 6 balls are drawn from the box.

Solution.

(a) Since one ball is drawn from the box, there are 9 outcomes in the sample space, corresponding to the 9 different draws. We shall assume that the outcomes are equally likely. Then, the probability that a red ball is drawn is ${\displaystyle {\frac {3}{9}}={\frac {1}{3}}}$ since there are 3 different draws (3 sample points) in the event.

(b) Although the question does not state whether the drawing of ball is unordered or ordered, the probability obtained with either assumption is actually the same, when the 9 balls are distinguishable. This is because every unordered draw of 6 balls corresponds to ${\displaystyle 6!}$ ordered draw of 6 balls. So if the ordered draws are equally likely, then the unordered draws must also be equally likely, vice versa. [2] Here, let us regard the draws as unordered for simplicity.

Then, the number of sample points in the sample space is ${\displaystyle {\binom {9}{6}}=84}$. (9 balls: 9 distinguishable cells with capacity 1; 6 draws: 6 indistinguishable "balls"). For the event, there are ${\displaystyle \underbrace {\binom {3}{2}} _{\text{red}}\times \underbrace {\binom {4}{3}} _{\text{green}}\times \underbrace {2} _{\text{blue}}=24}$ sample points. It follows that the probability is ${\displaystyle {\frac {24}{84}}={\frac {2}{7}}}$.

Exercise. Three (distinguishable) orange balls are added to the box. Calculate the probability that 2 red balls and 3 green balls are drawn from the box if 6 balls are drawn from the box again. (Answer: ${\displaystyle {\frac {5}{77}}}$)

Solution

With 3 additional orange balls, the number of sample points in the sample space becomes ${\displaystyle {\binom {12}{6}}=924}$. The number of sample points in the event becomes ${\displaystyle \underbrace {\binom {3}{2}} _{\text{red}}\times \underbrace {\binom {4}{3}} _{\text{green}}\times \underbrace {5} _{\text{blue/orange}}=60}$ sample points. Thus, the probability is ${\displaystyle {\frac {60}{924}}={\frac {5}{77}}}$.

Example. Suppose there are 20 couples in a room, and we pick two people randomly from the room (such that all outcomes in the sample space are equally likely). Calculate the probability that the two people picked are a couple.

Solution.

We regard the two picks to be indistinguishable (or unordered). Then, there are ${\displaystyle {\binom {20}{2}}=190}$ equally likely sample points in the sample space (place 2 indistinguishable picks into 20 people with capacity 1). Also, the event that the two people picked are a couple has 20 sample points since there are 20 couples.

It follows that the probability is ${\displaystyle {\frac {20}{190}}={\frac {2}{19}}}$.

Exercise. Regard the two picks as ordered and calculate the probability again (which should also be ${\displaystyle {\frac {2}{19}}}$).

Solution

In this case, there are ${\displaystyle 20\times 19=380}$ equally likely sample points in the sample space. Also, the event has ${\displaystyle 20\times 2=40}$ sample points (20 couples, two possible orderings for each couple). Hence, the probability is ${\displaystyle {\frac {40}{380}}={\frac {2}{19}}}$.

Example. Amy and Bob are playing a game where each of them roll the same fair dice once, and the player who gets a greater number from the roll wins. If both players get the same number from the roll, then they draw.

(a) Calculate the probability that Amy and Bob draw.

(b) Hence or otherwise, calculate the probability that Amy wins.

Solution.

(a) There are ${\displaystyle \underbrace {6} _{\text{Amy's roll}}\times \underbrace {6} _{\text{Bob's roll}}=36}$ outcomes in total. There are 6 outcomes where Amy and Bob draw (both get 1,2,3,4,5, or 6). It follows that the probability is ${\displaystyle {\frac {6}{36}}={\frac {1}{6}}}$.

(b) To calculate the probability, clearly one can count the number of outcomes where Amy wins. But here we offer an alternative and more convenient approach, which makes use the symmetry of the game. Notice that the situation for Amy and Bob is basically the same in the game (for each of them, there is another player doing the same thing as him/her, and the winning conditions are the same). So, the game is kind of symmetric. Thus, by the natural symmetry of the game, the probability that Amy wins and the probability that Bob wins are equal.

But we can notice that the events that Amy wins, Bob wins, and they draw are disjoint. Also, their union is the whole sample space. It follows that the sum of their probabilities is 1. Hence, letting ${\displaystyle p}$ be the probability that Amy wins, we have ${\displaystyle p+p+{\frac {1}{6}}=1\implies p={\frac {5}{12}}.}$

Exercise. Consider a modified version of the above game, where if both players get the same number from the roll, then each of them roll the same fair dice once more. (We can see that the game must end after a finite number of dice rolling. It is impossible to keep having draw infinitely.)

Calculate the probability that Amy wins (eventually).

Solution

In this case, the probability of ending up with a draw is 0, since the game mechanism does not allow a draw. By the symmetry of the game, the desired probability is thus ${\displaystyle {\frac {1}{2}}}$.

Example. Consider a simple game for children. Initially, a child is given a paper with 4 dots on it, with a rectangular shape:

*     *
1     2

3     4
*     *


Then, the child is allowed to draw (exactly) 3 different line segments between two dots on the paper. The aim for the child is to draw a triangle, formed by the 3 line segments, where each vertex of the triangle has to be a dot on the paper. For instance, in this case:

 *    *
\  /
\/
/\
/  \
*----*


the lower triangle is not counted as a valid one, since one of its vertices is not a dot on the paper (instead, it is made by intersection of two line segments).

Suppose we draw the 3 different line segments randomly. Calculate the probability that there is a (valid) triangle in the resulting diagram.

Solution.

Let us regard the 3 draws of line segment as unordered. Notice that there are ${\displaystyle {\binom {4}{2}}=6}$ places to draw a line segment. (To draw a line segment, we choose two dots from the four dots (without considering the order).) Then, there are ${\displaystyle {\binom {6}{3}}=20}$ ways to draw 3 different line segments (which should be equally likely).

Among the 20 ways, there are ${\displaystyle {\binom {4}{3}}=4}$ ways where a valid triangle is formed. (joining 123, 124, 134, or 234. We choose 3 dots from the 4 dots to form a triangle (without considering order), and there are four ways.) Hence, the probability is ${\displaystyle {\frac {4}{20}}={\frac {1}{5}}}$.

Exercise. Suppose there are 5 dots on the paper, forming a shape like a regular pentagon.

        * 1

5 *        * 2

*    *
4      3


Suppose we draw 3 different line segments randomly. Calculate the probability that there is a (valid) triangle in the resulting diagram. (Answer: ${\displaystyle {\frac {1}{12}}}$)

Solution

Similarly, we regard the 3 draws as unordered. In this case, there are ${\displaystyle {\binom {5}{2}}=10}$ places to draw a line segment. Thus, there are ${\displaystyle {\binom {10}{3}}=120}$ ways to draw 3 different line segments.

Among the 120 ways, there are ${\displaystyle {\binom {5}{3}}=10}$ ways where a valid triangle is formed. Hence, the probability is ${\displaystyle {\frac {10}{120}}={\frac {1}{12}}}$.

Example. Suppose we are drawing 5 cards from a poker deck consisting of 52 cards.

(a) Calculate the probability for getting a flush, that is, all 5 cards drawn are of the same suit, and not all are of sequential rank. (Note: sequential rank means A2345,23456,..., or 10JQKA. JQKA2, etc. are not counted as sequential rank.)

(b) Calculate the probability that all 5 cards drawn are of the same color.

(c) Calculate the probability for getting a four of a kind, that is, 4 cards are of the same kind among the 5 cards drawn.

Solution. We regard the 5 draws to be unordered, so there are in total ${\displaystyle {\binom {52}{5}}=2598960}$ outcomes, which should be equally likely.

(a) To form a flush, we first consider the 2-step process:

1. Choose a suit from the 4 suits. (4 ways)
2. Placing the 5 indistinguishable draws into the 13 cards with the suit chosen in step 1. (${\displaystyle {\binom {13}{5}}=1287}$ ways)

But among these ways, for each of the 4 ways in step 1, there are 10 ways in step 2 that are of sequential rank. So, we need to subtract ${\displaystyle 4\times 10=40}$ from the above ways. Hence, the probability is ${\displaystyle {\frac {4\times 1287-40}{2598960}}\approx 0.00197}$.

(b) For the 5 cards to be of the same color, we consider this as a 2-step process:

1. Choose a color from the 2 colors. (2 ways)
2. Placing the 5 indistinguishable draws into the 26 cards with the color chosen in step 1. (${\displaystyle {\binom {26}{5}}=65780}$ ways)

Hence, the probability is ${\displaystyle {\frac {2\times 65780}{2598960}}\approx 0.0506}$.

(c) To form a four of a kind, we consider this as a 2-step process:

1. Choose a kind from 13 kinds for the four cards with the same kind (13 ways)
2. Choose a card from the remaining 48 cards in the deck (48 ways)

Then, the probability is ${\displaystyle {\frac {13\times 48}{2598960}}\approx 0.00024}$.

Exercise.

(a) Calculate the probability for getting a straight, that is, the 5 cards drawn consists of five cards of sequential rank, and not all are of the same suit. (Note: A2345, 23456, ..., 10JQKA are counted as straight, but e.g., JQKA2 is not counted as straight. This also applies to straight flush below.) (Answer: approximately 0.00392)

(b) Calculate the probability for getting a straight flush, that is, the 5 cards drawn consists of five cards of sequential rank, and all are of the same suit. (Answer: approximately 0.0000154)

(c) Calculate the probability for getting a full house, that is, the 5 cards drawn contains 3 cards of a rank, and 2 cards of another rank. (Answer: approximately 0.00144)

Solution

(a) To form a straight, we first consider the following 6-step process:

1. Choose a sequential rank from 10 possible sequential ranks (A2345,23456, ..., 10JQKA) for the straight. (10 ways)
2. Choose a suit from the 4 suits for the first card in the sequence. (4 ways)
3. Choose a suit from the 4 suits for the second card in the sequence. (4 ways)
4. Choose a suit from the 4 suits for the third card in the sequence. (4 ways)
5. Choose a suit from the 4 suits for the fourth card in the sequence. (4 ways)
6. Choose a suit from the 4 suits for the fifth card in the sequence. (4 ways)

But in these steps, it is possible that all cards drawn are also of the same suit. So, we need to exclude those cases, and there are ${\displaystyle 10\times 4=40}$ of them, since for each of the 10 ways in step 1, we may choose the same suit from 4 suits throughout the steps 2-6 (4 ways). Hence, the probability is ${\displaystyle {\frac {10\times 4^{5}-40}{2598960}}\approx 0.00392.}$ (b) To form a straight flush, the number of ways is the same as the number of cases excluded in (a) (i.e., 40). So, the probability is ${\displaystyle {\frac {40}{2598960}}\approx 0.0000154.}$ (c) To form a full house, we consider this as a 4-step process:

1. Choose a rank from the 13 ranks for the 3 cards with the same rank. (13 ways)
2. Place 3 indistinguishable draws into the 4 cards with that rank, for the rank chosen in step 1. (${\displaystyle {\binom {4}{3}}=4}$ ways)
3. Choose a rank from the remaining 12 ranks for the 2 cards with the same rank. (12 ways)
4. Place 2 indistinguishable draws into the 4 cards with that rank, for the rank chosen in step 1. (${\displaystyle {\binom {4}{2}}=6}$ ways)

Thus, the probability is ${\displaystyle {\frac {13\times 4\times 12\times 6}{2598960}}\approx 0.00144}$

Example. (Raffle) Suppose there are 10000 distinct raffle tickets for a raffle. After all 10000 raffle tickets are sold out, 3 different tickets from the 10000 raffle tickets will be chosen randomly as 3 winning tickets, and the holder of each of these 3 winning tickets will receive a prize (it is possible that one person receives multiple prizes, if the person owns multiple winning tickets).

(a) If Amy purchases 10 of the 10000 raffle tickets, then what is the probability for Amy to win at least one prize?

(b) If Amy purchases 100 of the 10000 raffle tickets, then what is the probability for Amy to win at least one prize?

(c) If Amy purchases 1000 of the 10000 raffle tickets, then what is the probability for Amy to win at least one prize?

(Hint: Consider complementary events.)

Solution.

All outcomes for the raffle should be equally likely. Let us regard the winning tickets choices as indistinguishable.

• The number of outcomes in total is ${\displaystyle {\binom {10000}{3}}=166616670000}$. (Placing 3 indistinguishable balls (winning ticket choices) into 10000 distinguishable cells (raffle tickets) with capacity one.)
• For Amy to not win any prize, we are placing 3 indistinguishable balls (winning ticket choices) into ${\displaystyle 10000-k}$ distinguishable cells (raffle tickets that are not purchased by Amy) with capacity one, assuming Amy purchases ${\displaystyle k}$ tickets.
• This is the complementary event of "Amy wins at least one prize".

(a) The probability is ${\displaystyle 1-{\frac {\binom {10000-10}{3}}{\binom {10000}{3}}}=1-{\frac {166117269780}{166616670000}}\approx 0.002997}$.

(b) The probability is ${\displaystyle 1-{\frac {\binom {10000-100}{3}}{\binom {10000}{3}}}=1-{\frac {161667498300}{166616670000}}\approx 0.0297}$.

(c) The probability is ${\displaystyle 1-{\frac {\binom {10000-1000}{3}}{\binom {10000}{3}}}=1-{\frac {121459503000}{166616670000}}\approx 0.2710243}$.

Exercise. Suppose a single raffle ticket may be chosen to be a winning ticket multiple times. That is, some of the 3 winning tickets chosen may be the same.

Calculate the probability for Amy to win at least one prize if Amy purchases 1000 of 10000 raffle tickets. (Answer: 0.271 exactly)

Solution

In this case, let us regard the winning tickets choices as distinguishable and ordered for convenience. Then, the number of outcomes in total is ${\displaystyle 10000^{3}}$. (Placing 3 distinguishable balls (winning ticket choices) into 10000 distinguishable cells (raffle tickets) with unlimited capacity.) The number of sample points in the event for Amy not winning any prize is ${\displaystyle 9000^{3}}$ (Placing 3 distinguishable balls (winning ticket choices) into 9000 distinguishable cells (raffle tickets not purchased by Amy) with unlimited capacity.) So, the probability is ${\displaystyle 1-{\frac {9000^{3}}{10000^{3}}}=0.271.}$ (Notice that the answer is exactly 0.271 this time.)

Remark.

• The answer is expected to be quite close to the one in (c), since even a single raffle ticket may be chosen to be a winning ticket more than once, it is quite unlikely for that to happen. So, we will expect that the effect on the randomness should be quite small.

Example. Consider a game which involves repeated tossing of a fair coin. There are two players in the game: player A and B. The fair coin is tossed repeatedly until one of the player wins. Player A wins if the outcome pattern "THH" appears first, and player B wins if the outcome pattern "HHH" appear first. (H: heads come up; T: tails come up) For example, if the outcome sequence is "HHTHTHTTHTTHH", then player A wins after the last toss. On the other hand, if the outcome sequence is "HHH", then player B wins after the last toss.

(a) Can the game continue forever?

(b) Calculate the probability that player A wins.

Solution.

(a) No. Either outcome pattern must appear eventually.

(b) Notice that the player B wins if and only if a head comes up for each of the first three tosses.

Proof. "if" part: It is easy to see that player B wins if the first three tosses result in "HHH". This is just the winning condition for player B.

"only if" part: We prove it by contrapositive. So, we start with assuming that a tail comes up for some of the first three tosses. Then, we focus on the toss with outcome "T". If "HH" appears afterward, then player A wins (so player B does not win). Otherwise, if the next toss is with outcome "T", then we focus on the outcome "T". If "HT" appears afterward, we focus on the toss with outcome "T". Through this argument, we can see that the only way to terminate the game is that player A wins. So, player B never win in this case.

${\displaystyle \Box }$

It follows that the probability that player B wins is the same as the probability that a head comes up for each of the first three tosses, which is ${\displaystyle {\frac {1}{2^{3}}}={\frac {1}{8}}}$. (2 outcomes in each toss)

From (a), we know that the game must terminate eventually, that is, either player must win eventually. Hence, the probability that player A wins is ${\displaystyle 1-{\frac {1}{8}}={\frac {7}{8}}}$.

Example. Consider a experiment where we draw two integers from the integers 1,2,3 with replacement. Here, we define the sample space be ${\displaystyle \Omega =\{(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)\}}$, where the ordered pair ${\displaystyle (a,b)}$ indicates that the integer ${\displaystyle a}$ and ${\displaystyle b}$ is drawn in first and second draw respectively.

It is given that the probability ${\displaystyle \mathbb {P} (\{(m,n)\})={\frac {m+n}{36}}}$ for every ${\displaystyle (m,n)\in \Omega }$.

(a) Verify that the sum of the probabilities of all singleton events is 1.

(b) Calculate the probability for the sum of the two integers drawn to be even.

Solution.

(a) The sum is ${\displaystyle {\frac {1+1+1+2+1+3+2+1+2+2+2+3+3+1+3+2+3+3}{36}}=1.}$ (b) Notice that the event for the sum of the two integers drawn to be even is ${\displaystyle \{(1,1),(1,3),(2,2),(3,1),(3,3)\}}$. So, the probability of this event is ${\displaystyle \mathbb {P} (\{(1,1)\})+\mathbb {P} (\{(1,3)\})+\mathbb {P} (\{(2,2)\})+\mathbb {P} (\{(3,1)\})+\mathbb {P} (\{(3,3)\})={\frac {2+4+4+4+6}{36}}={\frac {20}{36}}={\frac {5}{9}}.}$

Exercise. Suppose we disregard the order of the draws, and hence define the sample space to be ${\displaystyle \Omega ={\big \{}\{1,1\},\{1,2\},\{1,3\},\{2,2\},\{2,3\},\{3,3\}{\big \}}}$, where ${\displaystyle \{a,b\}}$ means the integers ${\displaystyle a}$ and ${\displaystyle b}$ are drawn in the two draws.

(a) Suppose the probability ${\displaystyle \mathbb {P} (\{\{m,n\}\})={\frac {m+n}{c}}}$ for every ${\displaystyle \{m,n\}\in \Omega }$ and for some constant ${\displaystyle c}$. Determine the constant ${\displaystyle c}$ such that the sum of the probabilities of all singleton events is 1.

(b) Calculate the probability for the sum of the two integers drawn to be even with the constant ${\displaystyle c}$ determined in (a). (Answer: ${\displaystyle {\frac {2}{3}}}$)

Solution

(a) Since ${\displaystyle {\frac {1+1+1+2+1+3+2+2+2+3+3+3}{c}}=1\implies c=24,}$ the constant ${\displaystyle c}$ is 24.

(b) The event for the sum of the two integers drawn to be even is ${\displaystyle {\big \{}\{1,1\},\{1,3\},\{2,2\},\{3,3\}{\big \}}}$. So, the probability of this event is ${\displaystyle \mathbb {P} (\{\{1,1\}\})+\mathbb {P} (\{\{1,3\}\})+\mathbb {P} (\{\{2,2\}\})+\mathbb {P} (\{\{3,3\}\})={\frac {2+4+4+6}{24}}={\frac {16}{24}}={\frac {2}{3}}.}$

### More advanced properties of probability

Recall the inclusion-exclusion principle in combinatorics. We have similar results for probability:

Theorem. (Inclusion-exclusion principle) Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space, and ${\displaystyle E_{1},E_{2},\dotsc ,E_{n}}$ be sets in the event space ${\displaystyle {\mathcal {F}}}$ (${\displaystyle n}$ is an arbitrary positive integer). Then, {\displaystyle {\begin{aligned}\mathbb {P} (E_{1}\cup E_{2}\cup \dotsb \cup E_{n})&=\sum _{j=1}^{n}(-1)^{j+1}\sum _{i_{1}<\dotsb

Proof. We can prove this by mathematical induction.

Let ${\displaystyle P(n)}$ be the statement ${\displaystyle \mathbb {P} (E_{1}\cup E_{2}\cup \dotsb \cup E_{n})=\sum _{i_{1}}^{}\mathbb {P} (E_{i_{1}})-\sum _{i_{1} We wish to prove that ${\displaystyle P(n)}$ is true for every positive integer ${\displaystyle n}$.

Basis Step: When ${\displaystyle n=1}$, ${\displaystyle P(n)}$ is clearly true since it merely states that ${\displaystyle \mathbb {P} (E_{1})=\mathbb {P} (E_{1})}$.

Inductive Hypothesis: Assume that ${\displaystyle P(k)}$ is true for an arbitrary positive integer ${\displaystyle k}$.

Inductive Step:

Case 1: ${\displaystyle k=1}$. Then, ${\displaystyle P(k+1)=P(2)}$ is true by a property of probability (recall that we have "${\displaystyle \mathbb {P} (A\cup B)=\mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B)}$").

Case 2: ${\displaystyle k\geq 2}$. We wish to prove that ${\displaystyle P(k+1)}$ is true. The main idea of the steps is to regard ${\displaystyle E_{1}\cup E_{2}\cup \dotsb E_{k}\cup E_{k+1}}$ as ${\displaystyle (E_{1}\cup E_{2}\cup \dotsb \cup E_{k})\cup E_{k+1}}$, and then we apply the above property of probability, and eventually we will apply the inductive hypothesis twice, on two probabilities involving union of ${\displaystyle k}$ events. Ultimately, through some (somewhat complicated) algebraic manipulations, we finally get the desired result. The details are as follows (may be omitted): {\displaystyle {\begin{aligned}\mathbb {P} (E_{1}\cup E_{2}\cup \dotsb \cup E_{k}\cup E_{k+1})&=\mathbb {P} {\big (}(E_{1}\cup E_{2}\cup \dotsb \cup E_{k})\cup E_{k+1}{\big )}\\&=\mathbb {P} (E_{1}\cup E_{2}\cup \dotsb \cup E_{k})+\mathbb {P} (E_{k+1})-\mathbb {P} {\big (}(E_{1}\cup E_{2}\cup \dotsb \cup E_{k})\cap E_{k+1}{\big )}\quad ({\text{using the above property of probability again}})\\&={\color {red}\mathbb {P} (E_{1}\cup E_{2}\cup \dotsb \cup E_{k})}+\mathbb {P} (E_{k+1})-{\color {blue}\mathbb {P} {\big (}(E_{1}\cap E_{k+1})\cup (E_{2}\cap E_{k+1})\cup \dotsb \cup (E_{k}\cap E_{k+1})}{\big )}\quad ({\text{distributive law}})\\&={\color {red}\sum _{i_{1}}^{}\mathbb {P} (E_{i_{1}})-\sum _{i_{1} So, ${\displaystyle P(k+1)}$ is true.

Hence, by the principle of mathematical induction, ${\displaystyle P(n)}$ is true for every positive integer ${\displaystyle n}$.

${\displaystyle \Box }$

Remark.

• "${\displaystyle \sum _{i_{1}<\dotsb " means summing over every vector ${\displaystyle (i_{1},\dotsc ,i_{j})}$ such that ${\displaystyle i_{1}<\dotsb . It follows that every term in the sum is chosen from ${\displaystyle E_{1},E_{2},\dotsc ,E_{n}}$ where for every sum, we place indistinguishable choices (we sort them ascendingly)into the ${\displaystyle n}$ events. Hence, the number of terms in the "${\displaystyle \sum _{i_{1}<\dotsb " is ${\displaystyle {\binom {n}{j}}}$ (when the inclusion-exclusion principle applies to ${\displaystyle n}$ events).

Example. Let ${\displaystyle (\Omega ,{\mathcal {F}},\mathbb {P} )}$ be a probability space. When ${\displaystyle n=3}$, the inclusion-exclusion principle becomes ${\displaystyle \mathbb {P} (E_{1}\cup E_{2}\cup E_{3})=\mathbb {P} (E_{1})+\mathbb {P} (E_{2})+\mathbb {P} (E_{3})-\mathbb {P} (E_{1}\cap E_{2})-\mathbb {P} (E_{1}\cap E_{3})-\mathbb {P} (E_{2}\cap E_{3})+\mathbb {P} (E_{1}\cap E_{2}\cap E_{3})}$ where ${\displaystyle E_{1},E_{2},E_{3}\in {\mathcal {F}}}$.

Exercise. Write out the formula for inclusion-exclusion principle explicitly for 4 events ${\displaystyle E_{1},E_{2},E_{3},E_{4}\in {\mathcal {F}}}$. (Suggestion: check that the number of terms in each sum is correct (see the remark above).)

Solution

{\displaystyle {\begin{aligned}\mathbb {P} (E_{1}\cup E_{2}\cup E_{3}\cup E_{4})&=\mathbb {P} (E_{1})+\mathbb {P} (E_{2})+\mathbb {P} (E_{3})+\mathbb {P} (E_{4})&\left({\binom {4}{1}}=4{\text{ terms}}\right)\\&\quad -\mathbb {P} (E_{1}\cap E_{2})-\mathbb {P} (E_{1}\cap E_{3})-\mathbb {P} (E_{1}\cap E_{4})-\mathbb {P} (E_{2}\cap E_{3})-\mathbb {P} (E_{2}\cap E_{4})-\mathbb {P} (E_{3}\cap E_{4})&\left({\binom {4}{2}}=6{\text{ terms}}\right)\\&\quad +\mathbb {P} (E_{1}\cap E_{2}\cap E_{3})+\mathbb {P} (E_{1}\cap E_{2}\cap E_{4})+\mathbb {P} (E_{1}\cap E_{3}\cap E_{4})+\mathbb {P} (E_{2}\cap E_{3}\cap E_{4})&\left({\binom {4}{3}}=4{\text{ terms}}\right)\\&\quad -\mathbb {P} (E_{1}\cap E_{2}\cap E_{3}\cap E_{4}).&\left({\binom {4}{4}}=1{\text{ term}}\right)\\\end{aligned}}}