# Probability/Probability Spaces

## Terminologies[edit | edit source]

The name of this chapter, *probability space*, is a mathematical construct that models a random experiment.
To be more precise:

**Definition.**
(Probability space)
A *probability space* is a mathematical triplet consisting of three elements: a *sample space* , an *event space* , and a *probability measure* .

Let us first give the definitions related to sample space. For the definitions of event space and probability, we will discuss it in later sections.

**Definition.**
(Sample space and sample point)
The *sample space* of a random experiment, denoted by ,
is a nonempty set consisting of *all possible and indecomposable outcomes * of the random experiment, called *sample points*.

**Remark.**

- A sample point is often denoted by (lowercase omega), in contrast with the uppercase omega used for the notation of sample space.
- By an indecomposable outcome, we mean we are unable to split the outcome into multiple more specific outcomes ("smaller pieces"). So, in other words, indecomposable outcomes mean outcomes that are expressed in a way that is as specific as possible. Furthermore, the random experiment should only result in
*exactly one*of the indecomposable outcomes.

- For instance, when rolling a dice and consider the number facing up, we may say the outcomes are "odd number" and "even number". But such outcomes are decomposable: we can split "odd number" to three more specific outcomes: 1,3,5, and "even number" to three more specific outcomes: 2,4,6.
- On the other hand, we may regard the outcomes as the numbers 1,2,3,4,5,6, which are indecomposable, and are expressed in the most specific way.

**Example.**
A sample space of tossing a coin is (: heads come up, : tails come up), and the sample points are .

**Definition.**
(Event)
An *event* is a subset of the sample space.

**Remark.**

- So, an event is an aggregate of sample points.
- An event is said to be
*simple event*if it consists of exactly one sample point. - An event is said to be
*compound event*if it consists of more than one sample point.

**Definition.**
(Occurrence of events)
An event *occurs* if it contains the outcome from the random experiment.

**Example.**
We take the sample space of rolling a dice to be .
Then, the sets are *events*, while the set is not.

Suppose we get the number 3 from rolling the dice. Then, and occur, but does not occur.

## Probability interpretations[edit | edit source]

In this chapter, we will discuss probability mathematically, and we will give an axiomatic and abstract definition to probability (function).
By axiomatic definition, we mean defining probability to be a function that satisfying some axioms, called probability axioms.
But such axiomatic definition does not tell us how should we interpret the term "probability",
so the definition is said to be *independent* from the interpretation of probability.
Such independence make the formal definition always applicable, no matter how you interpret probability.

However, the axiomatic definition does not suggest a way to construct a probability measure (i.e., assigning probabilities to events): it just states that probability is a function satisfying certain axioms, but how can we construct such function in the first place? In this section, we will discuss two main types of probability interpretations: subjectivism and frequentism, where the method of assigning probabilities to events is mentioned in each of them.

### Subjectivism[edit | edit source]

Intuitively and naturally, *probability* of an event is often regarded as a numerical measure of the "chance" of the occurrence of the event (that is, how likely the event will occur).
So, it is natural for us to assign probability to an event based on our own assessment on the "chance". (In order for the probability to be valid according to the axiomatic definition, the assignment needs to satisfy the *probability axioms*.)
But different people may have different assessment on the "chance", depending on their personal opinions.
So, we can see that such interpretation of probability is somewhat *subjective*,
since different people may assign different probabilities to the same event.
Hence, we call such probability interpretation as *subjectivism* (also known as *Bayesian probability*).

**Example.**
Amy and Bob assign the probabilities of winning the top prize from a lucky draw according to subjectivism:

- Amy thinks that she is lucky, and thus assign 0.7 to the probability of winning the top prize.
- Bob thinks that he is unlucky, and thus assign 0.1 to the probability of winning the top prize.

The main issue of the subjectivism is the lack of objectivity, since different probabilities can be assigned to the same event based on personal opinion.
Then, we may have difficulties in choosing which of the probabilities should be used for that event.
To mitigate the issue of the lack of objectivity, we may adjust our degrees of belief on an event from time to time when there are more observed data through *Bayes' theorem*, which will be discussed in later chapter,
so that the value is assigned in a more objective way.
However, even after the adjustment, the assignment of value is still not in an *entirely* objective way,
since the adjusted value (known as *posterior probability*) still depends on the initial value (known as *prior probability*), which is assigned subjectively.

### Frequentism[edit | edit source]

Another probability interpretation, which is objective, is called *frequentism*.
We denote by the number of occurrences of an event in repetitions of experiment.
(An *experiment* is any action or process with an *outcome* that is subject to uncertainty or randomness.)
Then, we call as the *relative frequency* of the event .
Intuitively, we will *expect* that the relative frequency fluctuates less and less as gets larger and larger,
and approach to a constant limiting value (we call this as *limiting relative frequency*) as tends to infinity,
i.e., the limiting relative frequency is .
It is thus natural to take the limiting relative frequency as the probability of the event .
This is exactly what the definition of probability in the frequentism.
In particular, the *existence* of such limiting relative frequency is an assumption or *axiom* in frequentism.
(As a side result, when is large enough, the relative frequency of the event may be used to approximate the probability of the event .)

However, an issue of frequentism is that it may be infeasible to conduct experiments many times for some events. Hence, for those events, no probability can be assigned to them, and this is clearly a limitation for frequentism.

**Example.**
Suppose John is taking a course in probability theory, and the course can only be taken once.
Consider the event "John passes the course". For this event, it is infeasible to repeat the experiment many times,
since John can only take the course once, so there can be only one experiment.
(Notice that we cannot use the experiences for other students taking the course as the experiments for this event, since we specify that the person
taking the course must be John, but not others.)

**Example.**
Consider a couple, Amy and Bob, and the event "this couple divorce eventually".
For this event, it is again infeasible to repeat the experiment many times, since there is only one such pair of couples,
and thus there can only be one experiment.

Because of these issues, we will instead use a modern axiomatic and abstract approach to define probability, which is suggested by a Russian mathematician named Andrey Nikolaevich Kolmogorov in 1933.
By *axiomatic approach*, we mean defining probability quite broadly and abstractly as something that satisfy certain axioms
(called *probability axioms*).
Such probability axioms are the mathematical foundation and the basis of modern probability theory.

## Probability axioms[edit | edit source]

Since we want use the probability measure to assign probability to every event in the sample space, it seems natural for us to set *domain* of the probability measure
to be the set containing *all* subsets of , i.e., the power set of , .
Unfortunately, this situation is not that simple, and there are some technical difficulties if we set the domain like this,
when the sample space is *uncountable*.

**Remark.**

- A set is
*countable*, if, intuitively and informally, we can "count" the elements of the set. E.g., every finite set is countable since we can "count" the elements in every finite set one by one. Also, are countable since we can still "count" the elements in these sets one by one. (A countable infinite set is also called a*countably infinite*set.) - A set is
*uncountable*if it is not countable.

- Some examples of uncountable sets include and real intervals ().

This is because the power set of such uncountable sample space includes some "badly behaved" sets, which causes problems when assigning probabilities
to them. (Here, we will not discuss those sets and these technical difficulties in details.)
Thus, instead of setting the domain of the probability measure to be , we set the domain to be a *-algebra* (sigma-algebra) containing some "sufficiently well-behaved" events:

**Definition.**
(-algebra)
Let be a set (considered to be the universal set), and be its power set.
A set is a *-algebra* on the set , if the following three properties are satisfied:

- .
- For every set , if the set , then its complement . (closure under complementation)
- For every infinite sequence of sets , if the sets , then . (closure under countable unions)

**Remark.**

- Elements of the -algebra are sometimes called
*measurable sets*. Hence, whether a set is measurable or not actually depends on how we construct the -algebra. However, we usually construct -algebra to include "well-behaved" sets only, and exclude "badly behaved" sets. So, in this case, only "well-behaved" sets are measurable. - A -algebra on the sample space , which consists of a family (or set) of events, is also called an
*event space*, denoted by . - We call as a
*measurable space*. In the context of probability, the measurable space is . - When the property 3 is weakened to the closure under
*finite*unions, that is, "For every finite sequence , if the sets , then .", then we call as an*algebra*over the set . As we will see, the property 3 above implies this weakened property. Thus, a -algebra on the set is necessarily an algebra over the set (but not vice versa). - In terms of probabilities, an intuitive meaning of each of the above three properties is

- property 1: we should be able assign the probability to the entire sample space (this comes from the probability axiom actually);
- property 2: if we are able to assign probability to an event ("chance" of the occurrence of the event), then we should also be able to assign a probability to its complement ("chance" of the
*non-occurrence*of the event); - property 3: if we are able to assign probabilities to the events , then we should be able to assign a probability to their union ("chance" of the occurrence of
*one*of the events).

- Actually, from these three properties, we can deduce some similar results for finitely many events and intersections.

**Proposition.**
(Further properties of -algebra)
Let be a -algebra of a set . Then, we have

- .
- For every set , if the sets , then . (closure under finite unions)
- For every infinite sequence of sets , if the sets , then . (closure under countable intersections)
- For every set , if the sets , then . (closure under finite intersections)

**Proof.**

**Property 1**:
By the closure under complementation, since , it follows that .

**Property 2**:
By the closure under countable unions, we have for *every* infinite sequence of sets ,
if the sets , then .
So, in particular, we can choose the sequence to be ()
where is an arbitrary sequence such that .
Then,

**Property 3**:
For every infinite sequence of sets , by the closure under complementation, we have

**Property 4**:
The proof is similar to that of property 2, and hence left as an exercise.

**Remark.**

- From these properties, it follows that (finite/countably infinite) (unions/intersections) of (complements of) (sets in -algebra) is also in -algebra. So, even after many operations on sets in -algebra, the resulting set is also in -algebra.

**Exercise.**
Prove the property 4 in the proposition above.

**Proof.**
By the closure under countable *intersections* (property 3 in proposition above), we have for *every* infinite sequence of sets ,
if the sets , then .
So, in particular, we can choose the sequence to be ()
where is an arbitrary sequence such that .
Then,

**Example.**
(Smallest and largest -algebra)
Let be a sample space.

(a) Prove that the event space (called *trivial -algebra*) is a -algebra on . This is the smallest -algebra on . (To be more precise, for every -algebra on .)

(b) Prove that the event space is a -algebra on . This is the largest -algebra on . (To be more precise, for every -algebra on .)

*Solution*.

**Proof.**
To prove that is a -algebra, we need to show that satisfies the three defining properties.

**Property 1**: Since , the property 1 is satisfied.

**Property 2** (closure under complementation): Since contains only and , it suffices to show that the complement of each of them also belongs to : and . So, the property 2 is satisfied.

**Property 3** (closure under countable unions): Since contains only and , the *union* of countably infinitely many events ("") in
(each of them is either or ) is either or . In either case, the union belongs to . Thus, the property 3 is satisfied.

**Proof.**

**Property 1**: Since , it follows that . Thus, the property is satisfied.

**Property 2**: Assume that . Then, . By the definition of complement, we have . Hence, , and thus the property is satisfied.

**Property 3**: For every infinite sequence of sets , we have . It follows by the property of union that , and hence .
Thus, the property is satisfied.

**Exercise.**

We have seen two examples of -algebra in the example above.
Often, the "smallest" -algebra is not chosen to be the domain of the probability measure, since we usually are
interested in events *other than* and .

For the "largest" -algebra, on the other hand, it contains every event, but we may not be interested in some of them.
Particularly, we are usually interested in events that are "well-behaved", instead of those "badly behaved" events (indeed, it may be
even impossible to assign probabilities to them properly (those events are called *nonmeasurable*)).

Fortunately, when the sample space is *countable*, every set in is "well-behaved", so
we can take this power set to be a -algebra for the domain of probability measure.

However, when the sample space is *uncountable*, even if the power set is a -algebra, it contains "too many" events, particularly, it even includes some "badly behaved" events.
Therefore, we will not choose such power set to the domain of the probability measure.
Instead, we just choose a -algebra that includes the "well-behaved" events to be the domain, so that we are able to assign probability properly to every event in the -algebra of the domain.
Particularly, those "well-behaved" events are often the events of interest, so all events of interest are contained in that -algebra, that is, the domain of the probability measure.

To motivate the probability axioms, we consider some properties that the "probability" in frequentism (as a limiting relative frequency) possess:

- The limiting relative frequency must be nonnegative. (We call this property as
*nonnegativity*.) - The limiting relative frequency of the whole sample space ( is also an event) must be 1 (since by definition contains all sample points, this event must occur in every repetition). (We call this property as
*unitarity*.) - If the events are pairwise disjoint (i.e., for every with ), then the limiting relative frequency of the event (union of subsets of is a subset of , so it can be called an event) is

- which is the sum of the limiting relative frequency of each of the events . (We call this property as
*countable additivity*.)

It is thus very natural to set the probability axioms to be the three properties mentioned above:

**Definition.**
(Probability measure)

Let be the sample space of a random experiment, and be an associated event space (which is a -algebra on ).
A *probability measure* is a *function* , with the following *probability axioms* satisified:

- (P1) for every event , the
*probability*of the event , ; (nonnegativity) - (P2) ; (unitarity)
- (P3) for every infinite sequence of pairwise disjoint events (each of which is an element of ), . (countable additivity)

**Remark.**

- For the axiom P3, we consider infinite sequences instead of finite sequences to add generality for infinite sample spaces. As we will see, we have result similar to the axiom P3 for finitely many pairwise disjoint events.
- When is countable, we just set the associated event space to be .
- When is uncountable, the event space is usually not set to be . Instead, it is common to set the event space to be the Borel -algebra on (which is defined when is a topological space). The details are omitted here, but the event space contains all "well-behaved" sets and most sets we can think of in the power set anyway.
- The probability axiom P3 relates the events to their union . This matches with the closure under countable unions in the definition of -algebra.

Using the probability axioms alone, we can prove many well-known properties of probability.

### Basic properties of probability[edit | edit source]

Let us start the discussion with some simple properties of probability.

**Theorem.**
Let be a probability space and be a set in the event space . Then, .

**Proof.**
Consider the infinite sequence of events (recall that and must be in the -algebra ). We can see that the events are pairwise disjoint.
Also, the union of these events is . Hence, by the countable additivity of probability, we have

^{[1]}

Using this result, we can obtain *finite additivity* from the countable additivity of probability:

**Theorem.**
(Finite additivity)
Let be a probability space. Then, for every event ,
if the events are pairwise disjoint, then

**Proof.**
Consider the infinite sequence of events (recall that always).
Then,

Finite additivity makes the proofs of some of the following results simpler.

**Theorem.**
Let be a probability space, and and be sets in the event space . Then,

- (complementary event) . (The complementary event is taken with respect to the universal set.)
- (numeric bound) .
- .
- .
- (monotonicity) If , then .

**Proof.**

**Property 1**:

First, notice that by definition . Furthermore, since , we have by the closure under complementation of -algebra. Also, the sets and are disjoint. Thus, by the finite additivity, we have

**Property 2**:
By property 1, we have . We then have the desired numeric bound on since also by the nonnegativity of probability.

**Property 3**:

**Property 4**:
By property 3, we have

**Property 5**:
Assume that . Then, . Hence, by property 3,

**Remark.**

- The numeric bound property ensures that the probability measure is indeed a well-defined function (there is not a probability of event that is out of the codomain ).

**Example.**
Let and the event of winning the champion in the competition, and entering the final of the competition respectively.
Then, (when we win the champion, then we must enter the final),
and so by monotonicity.
That is, the probability of winning the champion in a competition is less than or equal to that of entering the final of the competition.

**Example.**
Let be a probability space, and .
Show that if , then .

**Proof.**
Assume . Then, we have since
by the nonnegativity of probability.
On the other hand, we have . So, by monotonicity, , and thus
.
Combining the inequalities we get yields the desired result.

**Exercise.**
Let be a probability space, and .
Show that if *and *, then .

**Proof.**
Assume and . Then, . Hence,

**Exercise.**
Let be a probability space, and .
Show that . (*Hint*: consider the property .)

**Proof.**
By the hint, we have

**Example.**
Suppose you roll a loaded dice, and the corresponding probability space is .
It is given that , and
,
and .
Calculate the probability of getting a 1 from the dice, i.e., .

*Solution*.
First, notice that and .
Furthermore, .
Also, and are disjoint.
Hence,

Also, are disjoint, and hence

**Exercise.**
Calculate the probability of getting a 2 from the dice.

From the example above, we have . Since are disjoint, it follows that

**Remark.**

- For simplicity, we may sometimes omit the curly braces inside the probability measure. For example, we may write instead of .

**Example.**
Let be a probability space.
Suppose we have and for events .
Can the events and be disjoint?

*Solution*.
They cannot be disjoint. We may prove it by *contradiction*.

**Proof.**
Assume to the contrary that and are disjoint. First, .
Then, by the finite additivity of probability, we have

**Exercise.**
What is the minimum probability of such that can possibly be a valid probability measure?
(*Hint*: it suffices to consider the bound on if is a valid probability measure. The bound is a *necessary condition* (but may not be sufficient) for to be a valid probability measure in logic terms.)

When is a valid probability measure, we have

**Exercise.**

Define be the *symmetric difference* of sets and , that is, .

Propose a formula for in terms of , and prove it.
(Note: the occurrence of the event corresponds to the occurrence of *exactly one* of the events and .)
(*Hint*: You may use without proof the fact that . Also, you may use the property that .)

**Proposition**: .

**Proof.**
We have

**Example.**
Let be a probability space, and be events such that
. It is given that

(a) ;

(b) (without using inclusion-exclusion principle for three events directly).

*Solution*.

(a)

- .
- .
- .

(b) First, . Then, we write

Note: to keep track of the probabilities of different events more easily, we may draw a Venn diagram:

|-------------| <--------- A | | | |----|----| | 0.2 | | | | | 0.1| 0.2| <---- C | | | | |--------|----|----|------| | |0.1 | 0.2| | | 0.1 | | | 0.1 | <---- B | |----|----| | |-------------|-----------|

It also allows us to calculate the probabilities in a more systematic manner. (E.g., we know that . Also, . So, the "remaining piece" in has 0.1 probability (i.e., .)

**Exercise.**
Calculate

(a) ;

(b) .

(a)

- .
- .
- .
- .
- .
- .

(b)

- .
- We have . Thus, .
- .
- .

### Constructing a probability measure[edit | edit source]

As we have said, the axiomatic definition does not suggest us a way to construct a probability measure. Actually, even for the same experiment, there can be many ways to construct a probability measure that satisfies the above probability axioms if there are not sufficient information provided:

**Example.**
(Coin tossing)
Consider a random experiment where we toss a coin (fair or unfair).
Suppose we want to define a probability space for this random experiment. Three elements are needed: sample space, event space, and probability measure.

Here, we define the sample space as ( and stands for "heads comes up" and "tails comes up" respectively).
Since is finite, the event space is chosen to be its power set, that is, .
It then remains to construct the probability measure , *subject to the probability axioms*.
Since the domain of the probability measure is , we need to define the "output" of (that is, assigning probability) for every event in .

For the empty set, by a property of probability, the probability assigned to it has to be zero.
Similarly, for the set , the probability assigned to it has to be one by the probability axiom P2.
But for the remaining two sets , there are no properties/axioms of probability that tell us what probability must we assign to each of them.
Despite this, the *sum* of the two probabilities assigned has to be one, by the finite additivity property:

So, without further information, we can set where (and thus ). Hence, we can see that there are infinitely many ways to construct a probability measure for this random experiment!

However, we have previously mentioned that we may assign probabilities to events subjectively (as in subjectivism), or according to its limiting relative frequency (as in frequentism).
Through these two probability interpretations, we may provide some background information for a random experiment, by assigning probabilities to
some of the events before constructing the probability measure, to the extent that there is *exactly one* way to construct a probability measure.
Consider the coin tossing example again:

**Example.**
(Coin tossing, continued)
Suppose we are told that the coin is fair, so based on this message, we assign the probability of "head comes up" and "tail comes up" to be respectively.
In this case, the way of constructing the probability measure is fixed:

In general, it is not necessary to assign probability to *every* event in the event space in the background information for us to able to construct the probability measure in exactly one way.
Consider the following example.

**Example.**
(Rolling dice)
Consider a random experiment where we roll a dice (loaded or unloaded). Then, we define the sample space as where each number represents the number coming up.
It follows that the event space is since is finite. (Since there are sets in the event space, we will not list out them.)
Now, we proceed to construct a probability measure (satisfying the probability axioms). Similarly, we need to consider all 64 events in the event space .

First, for the empty set and the sample space, we have and .
Now, let us consider the *singletons* (i.e., sets consisting one element) in the event space: .
Similarly, without further information, there are infinitely many possible ways to assign probability to each of them, subject to the constraint that
their sum has to be one by the probability axiom P2.
For now, let us denote

If we think more carefully, then we will find out that once these six probabilities are fixed, the probabilities for all other events are fixed, based on the property of finite additivity (which must be satisfied when the probability measure satisfies the probability axioms), since all other 56 events are just finite unions of some of these six events, and thus their probabilities can be obtained by adding up some of the above probabilities.

We can see from this example that to provide sufficient background information to the extent that the probability measure can be constructed in exactly one way, we just need the probability of each of the singleton events (which should be nonnegative and sum to one to satisfy the probability axioms). After that, we can calculate the probability for each of the other events in the event space, and hence construct the only possible probability measure.

This is true when the sample space is countable, in general:

**Theorem.**
Let be a probability space, where the sample space is countable.
Assume that the probability of each of the singleton events (i.e., for every ) is given
such that they are nonnegative and their sum is one (to satisfy the probability axioms).
Then, the probability measure is given by

**Proof.**

**Case 1**: is finite. Then, we can write . It follows that every event can be expressed as (which s are taken over for the union depends on the event ).
Notice also that the sets ""s are disjoint. (Every set contains a different sample point, and so the intersection of any pair of them is an empty set.)
Then, by the finite additivity of probability, we have for every event ,

**Case 2**: is countably infinite. Then, we can write .
It follows that every event can be expressed as (which s are taken over for the union depends on the event ).
Notice also that the sets ""s are disjoint.
Then, by the countable additivity/finite additivity of probability, we have for every event ,

**Example.**
Consider a random experiment where we draw an integer from all positive integers.
Then, the probability space is where .
Suppose it is given that

**Exercise.**
Calculate the probability .

We have by above formula

The following is an important special case for the above theorem.

**Corollary.**
(Formula for combinatorial probability)
Let be a probability space, where the sample space is *finite*.
Assume that the probability of each of the singleton events ( of them) is the same: .
(we say that all outcomes are *equally likely* in this case).
Then, the probability measure , called *combinatorial probability* (or *classical probability*) is given by

**Proof.**
Under the assumptions, the probability of every singleton event is nonnegative. Also, the sum of the probabilities is

**Remark.**

- By
*principle of indifference*, unless there exists evidence showing that all outcomes are*not*equally likely, e.g. it is given that a coin is biased, such that it is more likely that head comes up, we should*assume*that the outcomes are equally likely.

**Example.**
Suppose we roll a fair red dice and a fair blue dice. Then, we can define the sample space be ,
where means comes up for red dice, and comes up for blue dice.

Calculate the probability .

*Solution*.
The number of sample points is .
Since the outcomes should be equally likely, the desired probability is (there is only one sample point for this event).

**Exercise.**
Suppose the blue dice is colored red, such that the two dice become *indistinguishable*.

Then, student A claims that the probability , and he reasons as follows:

- Recall that when the two dices become indistinguishable, the number of possible distinct pairs of numbers facing up becomes (shown in previous chapter about combinatorics). So, there are only 21 sample points in the sample space, and hence the probability is .

On the other hand, student B claims that the probability should still be and he reasons as follows:

- Notice that in this case, the number 1 comes up for both dice correspond to exactly one outcome in the distinguishable case: . Also, changing the color of the dice should not affect the randomness of the experiment. So, the probability in this case should be the same as the probability .

Which of the students is correct? Also, explain why the other student is wrong.

Student B is correct. Student A is wrong, since in this case, not all outcomes in the experiment are equally likely. Particularly, some of the sample points in this case actually correspond to two original sample points, e.g. "1 comes up for one dice, 2 comes up for another" corresponds to and .

**Example.**
Consider a random experiment where a fair coin is tossed twice. Calculate the probability of getting heads in a row.

*Solution*.
There are outcomes in this experiment. They should be equally likely, and hence the probability of getting
two heads is .

**Exercise.**
Suppose we toss a fair coin five times.

(a) Calculate the probability ( means 5 heads in a row).

(b) Calculate the probability ( means 4 heads in a row, then 1 tails).

(c) Calculate the probability . (Answer: )

(d) Calculate the probability . (Answer: )

There should be equally likely outcomes.

(a) The probability is . (The event contains one sample point only.)

(b) The probability is . (The event contains one sample point only.)

(c) The event , containing five sample points. Thus, the probability is .

(d) We can count the number of sample points in the event by regarding the five outcomes from tossing five times as five distinguishable cells, and 2 tails as 2 indistinguishable balls. Then, we place 2 indistinguishable balls into 5 distinguishable cells (the remaining empty cells are for heads). So, the number is . Hence, the probability is .

**Example.**
(Capture-mark-recapture)
Suppose we are fishing in a lake, containing distinct fishes.
First, we catch fishes from the lake (capture), and give them each a marker (mark), and after that place them back into the lake.

Then, we catch fishes from the lake again (recapture), and catch fishes this time. Show that the probability that there is marked fishes in the fishes caught is

**Proof.**
We regard the fishes as distinguishable cells, and catches (in the second time) as indistinguishable balls (the order of the catch is not important, we are considering what fishes are caught only).
Then, there are equally likely sample points in the sample space.

It now remains to consider the number of sample points in the event. We now count the number of ways to have marked fishes in the fishes. We regard the marked distinct fishes as distinguishable cells, and non-marked fishes as distinguishable cells. (These two groups of cells are put separately.)

- Step 1: placing indistinguishable balls (catches) into distinguishable cells. ( ways)
- Step 2: placing indistinguishable balls (catches) into distinguishable cells. ( ways)

Hence, the number of sample points in the event is . The result follows.

**Example.**
There are 9 *distinguishable* balls in a box, consisting of 3 red balls, 2 blue balls and 4 green balls.

(a) Calculate the probability that a red ball is drawn from the box if 1 ball is drawn from the box.

(b) Calculate the probability that exactly 2 red balls and exactly 3 green balls are drawn from the box if 6 balls are drawn from the box.

*Solution*.

(a) Since one ball is drawn from the box, there are 9 outcomes in the sample space, corresponding to the 9 different draws. We shall assume that the outcomes are equally likely. Then, the probability that a red ball is drawn is since there are 3 different draws (3 sample points) in the event.

(b) Although the question does not state whether the drawing of ball is unordered or ordered, the probability obtained with either assumption is actually the same, when the 9 balls are *distinguishable*. This is because every unordered draw of 6 balls corresponds to ordered draw of 6 balls.
So if the ordered draws are equally likely, then the unordered draws must also be equally likely, vice versa.
^{[2]}
Here, let us regard the draws as unordered for simplicity.

Then, the number of sample points in the sample space is . (9 balls: 9 distinguishable cells with capacity 1; 6 draws: 6 indistinguishable "balls"). For the event, there are

**Exercise.**
Three (distinguishable) orange balls are added to the box. Calculate the probability that 2 red balls and 3 green balls are drawn from the box if 6 balls are drawn from the box again. (Answer: )

With 3 additional orange balls, the number of sample points in the sample space becomes . The number of sample points in the event becomes

**Example.**
Suppose there are 20 couples in a room, and we pick two people randomly from the room (such that all outcomes in the sample space are equally likely).
Calculate the probability that the two people picked are a couple.

*Solution*.

We regard the two picks to be indistinguishable (or unordered). Then, there are equally likely sample points in the sample space (place 2 indistinguishable picks into 20 people with capacity 1). Also, the event that the two people picked are a couple has 20 sample points since there are 20 couples.

It follows that the probability is .

**Exercise.**
Regard the two picks as ordered and calculate the probability again (which should also be ).

In this case, there are equally likely sample points in the sample space. Also, the event has sample points (20 couples, two possible orderings for each couple). Hence, the probability is .

**Example.**
Amy and Bob are playing a game where each of them roll the same fair dice once,
and the player who gets a greater number from the roll wins.
If both players get the same number from the roll, then they draw.

(a) Calculate the probability that Amy and Bob draw.

(b) Hence or otherwise, calculate the probability that Amy wins.

*Solution*.

(a) There are outcomes in total. There are 6 outcomes where Amy and Bob draw (both get 1,2,3,4,5, or 6). It follows that the probability is .

(b) To calculate the probability, clearly one can count the number of outcomes where Amy wins. But here we offer an alternative and more convenient approach, which makes use the *symmetry* of the game.
Notice that the situation for Amy and Bob is basically the same in the game (for each of them, there is another player doing the same thing as him/her, and the winning conditions are the same).
So, the game is kind of *symmetric*. Thus, by the natural symmetry of the game,
the probability that Amy wins and the probability that Bob wins are equal.

But we can notice that the events that Amy wins, Bob wins, and they draw are disjoint. Also, their union is the whole sample space. It follows that the sum of their probabilities is 1. Hence, letting be the probability that Amy wins, we have

**Exercise.**
Consider a modified version of the above game, where if both players get the same number from the roll, then each of them roll
the same fair dice once more. (We can see that the game must end after a finite number of dice rolling. It is impossible to keep having draw infinitely.)

Calculate the probability that Amy wins (eventually).

In this case, the probability of ending up with a draw is 0, since the game mechanism does not allow a draw. By the symmetry of the game, the desired probability is thus .

**Example.**
Consider a simple game for children. Initially, a child is given a paper with 4 dots on it, with a rectangular shape:

* * 1 2 3 4 * *

Then, the child is allowed to draw (exactly) 3 different line segments between two dots on the paper. The aim for the child is to draw a triangle, formed by the 3 line segments, where each vertex of the triangle has to be a dot on the paper. For instance, in this case:

* * \ / \/ /\ / \ *----*

the lower triangle is *not* counted as a valid one, since one of its vertices is not a dot on the paper (instead, it is made by intersection of two line segments).

Suppose we draw the 3 different line segments randomly. Calculate the probability that there is a (valid) triangle in the resulting diagram.

*Solution*.

Let us regard the 3 draws of line segment as unordered. Notice that there are places to draw a line segment. (To draw a line segment, we choose two dots from the four dots (without considering the order).) Then, there are ways to draw 3 different line segments (which should be equally likely).

Among the 20 ways, there are ways where a valid triangle is formed. (joining 123, 124, 134, or 234. We choose 3 dots from the 4 dots to form a triangle (without considering order), and there are four ways.) Hence, the probability is .

**Exercise.**
Suppose there are 5 dots on the paper, forming a shape like a regular pentagon.

* 1 5 * * 2 * * 4 3

Suppose we draw 3 different line segments randomly. Calculate the probability that there is a (valid) triangle in the resulting diagram. (Answer: )

Similarly, we regard the 3 draws as unordered. In this case, there are places to draw a line segment. Thus, there are ways to draw 3 different line segments.

Among the 120 ways, there are ways where a valid triangle is formed. Hence, the probability is .

**Example.**
Suppose we are drawing 5 cards from a poker deck consisting of 52 cards.

(a) Calculate the probability for getting a *flush*, that is, all 5 cards drawn are of the same suit, and not all are of sequential rank.
(Note: sequential rank means A2345,23456,..., or 10JQKA. JQKA2, etc. are not counted as sequential rank.)

(b) Calculate the probability that all 5 cards drawn are of the same color.

(c) Calculate the probability for getting a *four of a kind*, that is, 4 cards are of the same kind among the 5 cards drawn.

*Solution*.
We regard the 5 draws to be unordered, so there are in total outcomes, which should be equally likely.

(a) To form a flush, we first consider the 2-step process:

- Choose a suit from the 4 suits. (4 ways)
- Placing the 5 indistinguishable draws into the 13 cards with the suit chosen in step 1. ( ways)

But among these ways, for each of the 4 ways in step 1, there are 10 ways in step 2 that are of sequential rank. So, we need to subtract from the above ways. Hence, the probability is .

(b) For the 5 cards to be of the same color, we consider this as a 2-step process:

- Choose a color from the 2 colors. (2 ways)
- Placing the 5 indistinguishable draws into the 26 cards with the color chosen in step 1. ( ways)

Hence, the probability is .

(c) To form a four of a kind, we consider this as a 2-step process:

- Choose a kind from 13 kinds for the four cards with the same kind (13 ways)
- Choose a card from the remaining 48 cards in the deck (48 ways)

Then, the probability is .

**Exercise.**

(a) Calculate the probability for getting a *straight*, that is, the 5 cards drawn consists of five cards of sequential rank, and not all are of the same suit. (Note: A2345, 23456, ..., 10JQKA are counted as straight, but e.g., JQKA2 is not counted as straight. This also applies to straight flush below.) (Answer: approximately 0.00392)

(b) Calculate the probability for getting a *straight flush*, that is, the 5 cards drawn consists of five cards of sequential rank, and all are of the same suit. (Answer: approximately 0.0000154)

(c) Calculate the probability for getting a *full house*, that is, the 5 cards drawn contains 3 cards of a rank, and 2 cards of another rank. (Answer: approximately 0.00144)

(a) To form a straight, we first consider the following 6-step process:

- Choose a sequential rank from 10 possible sequential ranks (A2345,23456, ..., 10JQKA) for the straight. (10 ways)
- Choose a suit from the 4 suits for the first card in the sequence. (4 ways)
- Choose a suit from the 4 suits for the second card in the sequence. (4 ways)
- Choose a suit from the 4 suits for the third card in the sequence. (4 ways)
- Choose a suit from the 4 suits for the fourth card in the sequence. (4 ways)
- Choose a suit from the 4 suits for the fifth card in the sequence. (4 ways)

But in these steps, it is possible that all cards drawn are also of the same suit. So, we need to exclude those cases, and there are of them, since for each of the 10 ways in step 1, we may choose the same suit from 4 suits throughout the steps 2-6 (4 ways). Hence, the probability is

- Choose a rank from the 13 ranks for the 3 cards with the same rank. (13 ways)
- Place 3 indistinguishable draws into the 4 cards with that rank, for the rank chosen in step 1. ( ways)
- Choose a rank from the remaining 12 ranks for the 2 cards with the same rank. (12 ways)
- Place 2 indistinguishable draws into the 4 cards with that rank, for the rank chosen in step 1. ( ways)

Thus, the probability is

**Example.**
(Raffle)
Suppose there are 10000 distinct raffle tickets for a *raffle*. After all 10000 raffle tickets are sold out,
3 different tickets from the 10000 raffle tickets will be chosen randomly as 3 winning tickets,
and the holder of each of these 3 winning tickets will receive a prize
(it is possible that one person receives multiple prizes, if the person owns multiple winning tickets).

(a) If Amy purchases 10 of the 10000 raffle tickets, then what is the probability for Amy to win at least one prize?

(b) If Amy purchases 100 of the 10000 raffle tickets, then what is the probability for Amy to win at least one prize?

(c) If Amy purchases 1000 of the 10000 raffle tickets, then what is the probability for Amy to win at least one prize?

(*Hint*: Consider complementary events.)

*Solution*.

All outcomes for the raffle should be equally likely. Let us regard the winning tickets choices as indistinguishable.

- The number of outcomes in total is . (Placing 3 indistinguishable balls (winning ticket choices) into 10000 distinguishable cells (raffle tickets) with capacity one.)
- For Amy to
*not*win any prize, we are placing 3 indistinguishable balls (winning ticket choices) into distinguishable cells (raffle tickets that are*not*purchased by Amy) with capacity one, assuming Amy purchases tickets.

- This is the complementary event of "Amy wins at least one prize".

(a) The probability is .

(b) The probability is .

(c) The probability is .

**Exercise.**
Suppose a single raffle ticket may be chosen to be a winning ticket multiple times.
That is, some of the 3 winning tickets chosen may be the same.

Calculate the probability for Amy to win at least one prize if Amy purchases 1000 of 10000 raffle tickets. (Answer: 0.271 exactly)

In this case, let us regard the winning tickets choices as distinguishable and ordered for convenience. Then, the number of outcomes in total is . (Placing 3 distinguishable balls (winning ticket choices) into 10000 distinguishable cells (raffle tickets) with unlimited capacity.) The number of sample points in the event for Amy not winning any prize is (Placing 3 distinguishable balls (winning ticket choices) into 9000 distinguishable cells (raffle tickets not purchased by Amy) with unlimited capacity.) So, the probability is

**Remark.**

- The answer is expected to be quite close to the one in (c), since even a single raffle ticket may be chosen to be a winning ticket more than once, it is quite unlikely for that to happen. So, we will expect that the effect on the randomness should be quite small.

**Example.**
Consider a game which involves repeated tossing of a fair coin.
There are two players in the game: player A and B. The fair coin is tossed repeatedly until one of the player wins.
Player A wins if the outcome pattern "THH" appears first, and player B wins if the outcome pattern "HHH" appear first.
(H: heads come up; T: tails come up)
For example, if the outcome sequence is "HHTHTHTTHTTHH", then player A wins after the last toss.
On the other hand, if the outcome sequence is "HHH", then player B wins after the last toss.

(a) Can the game continue forever?

(b) Calculate the probability that player A wins.

*Solution*.

(a) No. Either outcome pattern must appear eventually.

(b) Notice that the player B wins *if and only if* a head comes up for each of the first three tosses.

**Proof.**
"if" part: It is easy to see that player B wins if the first three tosses result in "HHH". This is just the winning condition for player B.

"only if" part: We prove it by *contrapositive*. So, we start with assuming that a tail comes up for some of the first three tosses.
Then, we focus on the toss with outcome "T". If "HH" appears afterward, then player A wins (so player B does not win). Otherwise, if the next toss is with outcome "T", then we focus on the outcome "T". If "HT" appears afterward, we focus on the toss with outcome "T".
Through this argument, we can see that the only way to terminate the game is that player A wins. So, player B never win in this case.

It follows that the probability that player B wins is the same as the probability that a head comes up for each of the first three tosses, which is . (2 outcomes in each toss)

From (a), we know that the game must terminate eventually, that is, either player must win eventually. Hence, the probability that player A wins is .

**Example.**
Consider a experiment where we draw two integers from the integers 1,2,3 with replacement.
Here, we define the sample space be ,
where the ordered pair indicates that the integer and is drawn in
first and second draw respectively.

It is given that the probability

(a) Verify that the sum of the probabilities of all singleton events is 1.

(b) Calculate the probability for the sum of the two integers drawn to be even.

*Solution*.

(a) The sum is

**Exercise.**
Suppose we disregard the order of the draws, and hence define the sample space to be
,
where means the integers and are drawn in the two draws.

(a) Suppose the probability

(b) Calculate the probability for the sum of the two integers drawn to be even with the constant determined in (a). (Answer: )

(a) Since

(b) The event for the sum of the two integers drawn to be even is . So, the probability of this event is

### More advanced properties of probability[edit | edit source]

Recall the *inclusion-exclusion principle* in combinatorics. We have similar results for probability:

**Theorem.**
(Inclusion-exclusion principle)
Let be a probability space, and be sets in the event space ( is an arbitrary positive integer).
Then,

**Proof.**
We can prove this by mathematical induction.

Let be the statement

**Basis Step**: When , is clearly true since it merely states that .

**Inductive Hypothesis**: Assume that is true for an arbitrary positive integer .

**Inductive Step**:

**Case 1**: . Then, is true by a property of probability (recall that we have "").

**Case 2**: . We wish to prove that is true.
The main idea of the steps is to regard as , and then we apply the above property of probability, and eventually we will apply the inductive hypothesis twice, on two probabilities involving union of events. Ultimately, through some (somewhat complicated) algebraic manipulations, we finally get the desired result. The details are as follows (may be omitted):

Hence, by the principle of mathematical induction, is true for every positive integer .

**Remark.**

- "" means summing over every vector such that . It follows that every term in the sum is chosen from where for every sum, we place indistinguishable choices (we sort them ascendingly)into the events. Hence, the number of terms in the "" is (when the inclusion-exclusion principle applies to events).

**Example.**
Let be a probability space.
When , the inclusion-exclusion principle becomes

**Exercise.**
Write out the formula for inclusion-exclusion principle explicitly for 4 events . (*Suggestion*: check that the number of terms in each sum is correct (see the remark above).)

**Example.**
We select a student from some university students. It is given that

- the selected student has a major in mathematics with a probability 0.4;
- the selected student has a major in statistics with a probability 0.55;
- the selected student has a major in accounting with a probability 0.3;
- the selected student has a major in statistics and accounting with a probability 0.2;
- the selected student has a major in accounting and mathematics with a probability 0.15;
- the selected student has a major in mathematics and statistics with a probability 0.2;
- the selected student has a major in mathematics, statistics and accounting with a probability 0.1.

Then, the probability that the selected student does not have any of these majors is .