Probability/Conditional Probability
Motivation[edit  edit source]
In some situations we need a new kind of probability.
Consider the Monty Hall problem:
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? (from wikipedia)
^{[1]}
There are some (implicit) assumptions:
 The host must open a door that is not picked by us.
 The most must open a door with a goat, but not car behind it.
To determine whether we have advantage to switch our choice, we need to know the probability that the car is behind the door after switching our choice, given that the goat is behind door No. 3.
This probability is a conditional probability (the conditions are the host opens door No. 3 and we pick door No. 1 ), and we will discuss the value of this probability later in this chapter.
Definition[edit  edit source]
Let's motivate the definition of conditional probability by considering the following Venn diagram.
**  **    B\A   **  *** < B   B\A        > **    AnB       < B=Omega'   ***  AnB     A\B  < A  < Omega ***  **  **
Without any condition, the probability of is illustrated by the rectangular region consisting of both and . In the Venn diagram, the ratio of the area of the region to the area of the whole sample space is the ratio of to (or simply ). So,
If we are given (implying that ), then we can regard as the new sample space (RHS), say . Then, intuitively, the probability of given should be the ratio of area occupied by in the region for (i.e. area of ) to the area of . So, the probability of given should be
Definition. (Conditional probability) Conditional probability of event given event is
Remark.
 The assumption of prevent the above formula gives an undefined value.
 Also, it does not make sense to consider the probability of an event conditional on an impossible event, since an impossible event can never happen, then why can it be given to be happened?
 It follows that for each event and with (simplified multiplication rule of probability).
Example. (Conditional probability is a probability) Conditional probability is a probability, since it satisfies all 3 probability axioms.
Proof.
 (P1) since the numerator and denominator in the formula are both probabilities (i.e. they satisfy the 3 probability axioms), both are nonnegative. In particular, the denominator is positive, by the assumption. It follows that the fraction is positive.
 (P2) it suffices to prove that for each event with , which is true since ( by definition of event).
 (P3) for each infinite sequence of disjoint events ,
Example. (Special cases for conditional probability) If ( implies ), then , as expected (since given , which implies , is certain).
If and are disjoint, .
Example. (Even and prime numbers) We roll a fair fivefaced dice one time. Let and be the events that even number comes up and prime number comes up respectively. Then, , and .
Proof. The result follows from observing that among 1,2,3,4 and 5,
 there are 3 prime numbers, namely 2,3 and 5;
 there are 2 even numbers, namely 2 and 4;
 there is 1 number that is both prime and even, namely 2.
Example. Amy rolls two fair sixfaced dice, with one colored red and another colored blue (so that they are distinguishable), without looking at the dice. After Amy rolls the two dice, Bob tells Amy that there is at least one 6 coming up (assume Bob tells the truth). Then, the probability that 6 comes up for both dice is after hearing the information from Bob.
Proof. The condition is there is at least one 6 coming up, and the probability of this condition can be calculated by inclusionexclusion principle:
Exercise.
Chris claims that the desired probability in the example should be , since given there is at least one 6 coming up, we know that 6 comes up in a dice. Considering the another dice, which has six equally likely possible outcomes for the number coming up, namely 1,2,3,4,5 and 6, and we can regard this as the new sample space. The desired event is that 6 comes up for both dice, and thus the desired outcome for the another dice is 6. It follows that the probability is , since the number of outcomes in the desired event is 1, while that in the sample space is 6.
We know that the correct answer is , and not , but why is this claim wrong? (Credit: the idea of this question comes from this discussion)
answer 

The six outcomes considered only include the outcome in which 6 comes up in either red or blue dice. In either case, the cases for which 6 comes up for the dice with another color only is missed (there are 5 such corresponding cases in either case), and so the sample space is not complete, yielding the wrong answer.

Remark.
 denoting the numbers coming up in the form of ordered pair , in which is the number coming up for the red dice, and is the number coming up for the blue dice, then
 consisting of 11 equally likely outcomes, and among these, only is the desired outcome, and so the probability is , regarding the above set as the new sample space
 this matches with the motivation for the definition of conditional probability
 if Bob tells Amy that 6 comes up for the red dice, then the sample space is , consisting of 6 (equally likely) outcomes
Proposition. (Multiplication rule of probability) For each event ,
Proof.
Remark.
 It is also known as chain rule of probability.
Two important theorems related to conditional probability, namely law of total probability and Bayes' theorem, will be discussed in the following sections.
Law of total probability and Bayes' theorem[edit  edit source]
Theorem. (Law of total probability and Bayes' theorem) Assume that in which events are disjoint and have nonzero probabilities. Then,
Proof. Illustration (finite case):
**  B_1 B_2 B_3        v v v   ****                   < B    ****     AnB_1AnB_2AnB_3 < A  < Omega   ****                ****        **
Since are disjoint, are also disjoint (by observing that , and other intersections have similar results). It follows that
Remark.
 It follows from the definition of conditional probability that also, but the form in the theorem is more commonly used.
 The number of 's may be infinite or finite.
 The assumption is equivalent to ' occurs implies one and only one of 's occurs'.
Theorem. (Bayes' theorem) Assume that in which events are disjoint and have nonzero probabilities. Then,
Proof. It follows from the definition of conditional probability (for numerator) and law of total probability (for denominator). To be more precise,
Illustration (finite case):
**  B_1 B_2 B_3       Pr(B_3A)=  v v v   ****  **       > AnB_3  < Pr(AnB_3)       **      < B     ****   ****   AnB_1AnB_2AnB_3 < A  < Omega AnB_1AnB_2AnB_3  < Pr(AnB_1)+Pr(AnB_2)+Pr(AnB_3)   ****   ****              ****        **
Example. Assume that the weather at a certain day can either be sunny or rainy, with equal probability. Amy has a probability of () to bring an umbrella at that day if the weather of that day is rainy (sunny).
Let be the events that the weather at that day is rainy, sunny and Amy brings an umbrella at that day respectively. Then, the probability that Amy brings an umbrella at that day is
Given that Amy brings an umbrella at that day, the probability for that day to be rainy is
Independence[edit  edit source]
Motivation[edit  edit source]
Intuitively, if events are independent, then we expect that occurrence or nonoccurrence of some events does not affect the occurrence or nonoccurrence of the others. How do we express this meaning by probability expressions?
If there are only two events involved, it is quite simple: using the notion of conditional probability, we can define events and to be independent if and , or using just one equation, (by observing that ).
We can also define independence for more events: e.g. for three events , we would like to define they are independent if all of the following hold:
 ;
 ;
 ;
 ;
 ;
 .
We can see that when more events are involved, the requirement becomes more clumsy, if we use the conditional probabilities as the definition.
Since having all of the above hold is actually equivalent to having only the following requirement hold:
 For each finite subset , .
We can use this more compact expression for the definition.
Indeed, we have similar results when more events are involved, and so we have the following definition for independence.
Definition[edit  edit source]
Definition. (Independence) The events are independent if for each finite subset ,
Remark.
 Pairwise independence does not imply independence (but converse is true, and thus independence is 'stronger' than pairwise independence).
 We can use to denote the independence of and .
Example. (Events that are pairwise independent but not independent) Consider two balls, in which one is bigger than the another. Both balls are either be red or blue, with equal chance. Define
 be the event that the bigger ball is red;
 be the event that the smaller ball is red;
 be the event that both balls have the same color.
Then, and are pairwise independent but not independent.
Proof. Consider the following tables containing relevant probabilities:
 Then, and are pairwise independent since , and .
 However, .
 Thus, and are not independent.
Remark.
 If we know the occurrence or nonoccurrence of any two of and , then we know the color of the two balls.
 So, the remaining unknown event become either certain or impossible.
 E.g., if we know occurs and does not occur, then we know that
 the bigger ball is red,
 the smaller ball is blue (since the two balls have different color, while the bigger ball is red).
 So, becomes impossible.
 Thus, intuitively, and should not be independent.
Example. (Monty Hall problem) Recall the Monty Hall problem in the motivation section. Let , and be the events that door No. 1 is picked, car is behind door No. 2, and the host opens door No.3 respectively. The probability that the car is behind door No. 2 is
Proof.
 since is given, and so is certain.
 since the probability of is the same regardless of the door picked, i.e. .
 , since the car is equally likely to be put behind each door, by principle of insufficient reason.
 , by the assumption, since the host is impossible to open door 2, which has a car behind it (condition), and also door 1, which is picked by us (condition).
 , since the host is impossible to open door 1 (picked), and is equally likely to open door 2 and 3 by principle of insufficient reason.
 , since the host is impossible to open door 1 (picked) and 2 (with car behind it), and so the host certainly open door 3.
 , since the host is impossible to open door 3 (with car behind it).
 Having these probabilities, the result follows by applying the definition of conditional probability and multiplication rule of probability, as in above.
Remark.
 For other cases in which another door is picked, the same result holds by symmetry (notations can be changed in the expression).
Related results[edit  edit source]
Proposition. If and only if some events are independent, then they are still independent when part of them are changed to their complements.
Proof. We can prove it inductively. E.g., assume are independent. Then,
Example. (Events that are pairwise independent but not independent (cont'd)) Recall the three events in a previous example.
 be the event that the bigger ball is red;
 be the event that the smaller ball is red;
 be the event that both balls have the same color.
They are not independent in the condition in that example. It follows that and (namely the event that the two balls have different color, which is in an exercise for that example) are not independent.
Example. (Special cases for independence) A certain event is independent of arbitrary event. This also holds for a impossible event.
Proof.
 The empty set is the impossible event, since .
 For each event , .
 Also, .
 So, .
 The sample space is the certain event, since .
 Since , and for each event , it follows from the proposition about independence of complement events that .
Remark.
 The meaning of this result is that knowledge of arbitrary event does not make a certain event less certain, and also does not make an impossible event possible, which is intuitive.
Conditional independence[edit  edit source]
Conditional independence is a conditional version of independence, and has the following definition which is similar to that of independence.
Definition. (Conditional independence) The events are conditionally independent given if
Remark.
 In particular, if events and are conditionally independent given (assuming and ),
 This means that knowing happens does not affect the occurrence or nonoccurrence of .
 In general, some events are conditionally independent given event neither implies nor is implied by that given event .
 Conditional independence of some events neither implies nor is implied by independence of them. These two concepts are not related.
Example. Define
 be the event that the birthday of Amy is June 1st;
 be the event that the birthday of Bob is July 1st;
 be the event that Amy and Bob are twins.
Events and are conditionally independent given , but not conditionally independent given . Also, events and are independent (unconditionally). (Assume, for simplicity, that the birthday of Amy and Bob is equally likely to be one of the 365 dates in a year (not including February 29th).)
Proof.
 and are conditionally independent given since
 ;
 (there are equally likely (by principle of insufficient reason) distinct pairs of the birthdays).
 and are not conditionally independent given since
 ;
 (twins must have the same birthday).
 and are independent (unconditionally) since
 ;
 (there are equally likely (by principle of insufficient reason) distinct pairs of the birthdays).
References and footnotes[edit  edit source]
 ↑ if we pick the door with a car behind it, then we win the car. We win nothing otherwise