Probability/Conditional Probability

From Wikibooks, open books for an open world
Jump to navigation Jump to search


Motivation[edit | edit source]

In some situations we need a new kind of probability.

Consider the Monty Hall problem:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? (from wikipedia)

[1]

Illustration of the situation

There are some (implicit) assumptions:

  • The host must open a door that is not picked by us.
  • The most must open a door with a goat, but not car behind it.

To determine whether we have advantage to switch our choice, we need to know the probability that the car is behind the door after switching our choice, given that the goat is behind door No. 3.

This probability is a conditional probability (the conditions are the host opens door No. 3 and we pick door No. 1 ), and we will discuss the value of this probability later in this chapter.

Definition[edit | edit source]

Let's motivate the definition of conditional probability by considering the following Venn diagram.

*-------------------------*
|        *---------*      |
|        |   B\A   |      |              *---------*
|   *----*----*    |<-- B |              |   B\A   |
|   |    |    |    |      |    ---->     *----*    |
|   |    |AnB |    |      |              |    |    | <--- B=Omega'
|   |    *----*----*      |              |AnB |    |
|   | A\B     | <-- A     | <--- Omega   *----*----*
|   *---------*           |
*-------------------------*

Without any condition, the probability of is illustrated by the rectangular region consisting of both and . In the Venn diagram, the ratio of the area of the region to the area of the whole sample space is the ratio of to (or simply ). So,

If we are given (implying that ), then we can regard as the new sample space (RHS), say . Then, intuitively, the probability of given should be the ratio of area occupied by in the region for (i.e. area of ) to the area of . So, the probability of given should be

which is exactly the definition of conditional probability.

Definition. (Conditional probability) Conditional probability of event given event is

assuming .

Remark.

  • The assumption of prevent the above formula gives an undefined value.
  • Also, it does not make sense to consider the probability of an event conditional on an impossible event, since an impossible event can never happen, then why can it be given to be happened?
  • It follows that for each event and with (simplified multiplication rule of probability).

Example. (Conditional probability is a probability) Conditional probability is a probability, since it satisfies all 3 probability axioms.

Proof.

(P1) since the numerator and denominator in the formula are both probabilities (i.e. they satisfy the 3 probability axioms), both are nonnegative. In particular, the denominator is positive, by the assumption. It follows that the fraction is positive.
(P2) it suffices to prove that for each event with , which is true since ( by definition of event).
(P3) for each infinite sequence of disjoint events ,


Example. (Special cases for conditional probability) If ( implies ), then , as expected (since given , which implies , is certain).

If and are disjoint, .

Example. (Even and prime numbers) We roll a fair five-faced dice one time. Let and be the events that even number comes up and prime number comes up respectively. Then, , and .

Proof. The result follows from observing that among 1,2,3,4 and 5,

  • there are 3 prime numbers, namely 2,3 and 5;
  • there are 2 even numbers, namely 2 and 4;
  • there is 1 number that is both prime and even, namely 2.

Clipboard

Exercise. Suppose the dice is now six-faced.

1 Calculate .

None of the above.

2 Calculate .

None of the above.

3 Calculate .

None of the above.

4 Calculate .

None of the above.



Example. Amy rolls two fair six-faced dice, with one colored red and another colored blue (so that they are distinguishable), without looking at the dice. After Amy rolls the two dice, Bob tells Amy that there is at least one 6 coming up (assume Bob tells the truth). Then, the probability that 6 comes up for both dice is after hearing the information from Bob.

Proof. The condition is there is at least one 6 coming up, and the probability of this condition can be calculated by inclusion-exclusion principle:

Also,
The result follows.

Clipboard

Exercise.

Calculate the probability again if the blue dice is colored red, such that the two dice is not distinguishable.

None of the above.

Chris claims that the desired probability in the example should be , since given there is at least one 6 coming up, we know that 6 comes up in a dice. Considering the another dice, which has six equally likely possible outcomes for the number coming up, namely 1,2,3,4,5 and 6, and we can regard this as the new sample space. The desired event is that 6 comes up for both dice, and thus the desired outcome for the another dice is 6. It follows that the probability is , since the number of outcomes in the desired event is 1, while that in the sample space is 6.

We know that the correct answer is , and not , but why is this claim wrong? (Credit: the idea of this question comes from this discussion)


Remark.

  • denoting the numbers coming up in the form of ordered pair , in which is the number coming up for the red dice, and is the number coming up for the blue dice, then

consisting of 11 equally likely outcomes, and among these, only is the desired outcome, and so the probability is , regarding the above set as the new sample space
  • this matches with the motivation for the definition of conditional probability
  • if Bob tells Amy that 6 comes up for the red dice, then the sample space is , consisting of 6 (equally likely) outcomes

Proposition. (Multiplication rule of probability) For each event ,

Proof.

Remark.

  • It is also known as chain rule of probability.

Two important theorems related to conditional probability, namely law of total probability and Bayes' theorem, will be discussed in the following sections.

Law of total probability and Bayes' theorem[edit | edit source]

Theorem. (Law of total probability and Bayes' theorem) Assume that in which events are disjoint and have nonzero probabilities. Then,

Proof. Illustration (finite case):

*-----------------------------------------*
|      B_1    B_2     B_3                 |
|       |      |       |                  |
|       v      v       v                  |
|   *-------*-----*----------*            |
|   |       |     |          |            |
|   |       |     |          |            |
|   |       |     |          | <---- B    |
|   | *-----*-----*--------* |            |
|   | |AnB_1|AnB_2|AnB_3   |<----- A      | <---- Omega
|   | *-----*-----*--------* |            |
|   |       |     |          |            |
|   |       |     |          |            |
|   *-------*-----*----------*            |
|                                         |
|                                         |
|                                         |
*-----------------------------------------*

Since are disjoint, are also disjoint (by observing that , and other intersections have similar results). It follows that

in which since

Remark.

  • It follows from the definition of conditional probability that also, but the form in the theorem is more commonly used.
  • The number of 's may be infinite or finite.
  • The assumption is equivalent to ' occurs implies one and only one of 's occurs'.

Theorem. (Bayes' theorem) Assume that in which events are disjoint and have nonzero probabilities. Then,

Proof. It follows from the definition of conditional probability (for numerator) and law of total probability (for denominator). To be more precise,

Illustration (finite case):

*-----------------------------------------*              
|      B_1    B_2     B_3                 |
|       |      |       |                  |                     Pr(B_3|A)=
|       v      v       v                  |                
|   *-------*-----*----------*            |                     *--------*    
|   |       |     |          |            |   ------>           |AnB_3   |    <----- Pr(AnB_3) 
|   |       |     |          |            |                     *--------*     
|   |       |     |          | <---- B    |            -------------------------------
|   | *-----*-----*--------* |            |                 *-----*-----*--------*
|   | |AnB_1|AnB_2|AnB_3   |<----- A      | <---- Omega     |AnB_1|AnB_2|AnB_3   | <---- Pr(AnB_1)+Pr(AnB_2)+Pr(AnB_3)
|   | *-----*-----*--------* |            |                 *-----*-----*--------*
|   |       |     |          |            |              
|   |       |     |          |            |                  
|   *-------*-----*----------*            |                                       
|                                         |            
|                                         |
|                                         |
*-----------------------------------------*

Example. Assume that the weather at a certain day can either be sunny or rainy, with equal probability. Amy has a probability of () to bring an umbrella at that day if the weather of that day is rainy (sunny).

Let be the events that the weather at that day is rainy, sunny and Amy brings an umbrella at that day respectively. Then, the probability that Amy brings an umbrella at that day is

by law of total probability.

Given that Amy brings an umbrella at that day, the probability for that day to be rainy is

(by Bayes' theorem).

Clipboard

Exercise.

1 Assume that the weather can also be cloudy, such that the weather is twice as likely to be cloudy compared with sunny and rainy, and that Amy has a probability to bring an umbrella at that day if the weather is cloudy. Calculate such that the probability for that day to be rainy given Amy brings an umbrella at that day is instead.

None of the above.

2 Continue from previous question. Calculate such that .

None of the above.




Independence[edit | edit source]

Motivation[edit | edit source]

Intuitively, if events are independent, then we expect that occurrence or non-occurrence of some events does not affect the occurrence or non-occurrence of the others. How do we express this meaning by probability expressions?

If there are only two events involved, it is quite simple: using the notion of conditional probability, we can define events and to be independent if and , or using just one equation, (by observing that ).

We can also define independence for more events: e.g. for three events , we would like to define they are independent if all of the following hold:

  • ;
  • ;
  • ;
  • ;
  • ;
  • .

We can see that when more events are involved, the requirement becomes more clumsy, if we use the conditional probabilities as the definition.

Since having all of the above hold is actually equivalent to having only the following requirement hold:

  • For each finite subset , .

We can use this more compact expression for the definition.

Indeed, we have similar results when more events are involved, and so we have the following definition for independence.

Definition[edit | edit source]

Definition. (Independence) The events are independent if for each finite subset ,

Remark.

  • Pairwise independence does not imply independence (but converse is true, and thus independence is 'stronger' than pairwise independence).
  • We can use to denote the independence of and .

Example. (Events that are pairwise independent but not independent) Consider two balls, in which one is bigger than the another. Both balls are either be red or blue, with equal chance. Define

  • be the event that the bigger ball is red;
  • be the event that the smaller ball is red;
  • be the event that both balls have the same color.

Then, and are pairwise independent but not independent.

Proof. Consider the following tables containing relevant probabilities:

  • Then, and are pairwise independent since , and .
  • However, .
  • Thus, and are not independent.

Clipboard

Exercise.

1 Define be the event that the two balls have different color. Are and (a) pairwise independent; (b) independent?

(a) Yes; (b) Yes.
(a) Yes; (b) No.
(a) No; (b) Yes.
(a) No; (b) No.

2 Assume that both balls have a probability to be red and to be blue. and are pairwise independent for which of the following value(s) of ?

0
0.25
0.5
0.75
1
None of the above.



Remark.

  • If we know the occurrence or non-occurrence of any two of and , then we know the color of the two balls.
  • So, the remaining unknown event become either certain or impossible.
  • E.g., if we know occurs and does not occur, then we know that
  • the bigger ball is red,
  • the smaller ball is blue (since the two balls have different color, while the bigger ball is red).
  • So, becomes impossible.
  • Thus, intuitively, and should not be independent.

Example. (Monty Hall problem) Recall the Monty Hall problem in the motivation section. Let , and be the events that door No. 1 is picked, car is behind door No. 2, and the host opens door No.3 respectively. The probability that the car is behind door No. 2 is

Proof.

  • since is given, and so is certain.
  • since the probability of is the same regardless of the door picked, i.e. .
  • , since the car is equally likely to be put behind each door, by principle of insufficient reason.
  • , by the assumption, since the host is impossible to open door 2, which has a car behind it (condition), and also door 1, which is picked by us (condition).
  • , since the host is impossible to open door 1 (picked), and is equally likely to open door 2 and 3 by principle of insufficient reason.
  • , since the host is impossible to open door 1 (picked) and 2 (with car behind it), and so the host certainly open door 3.
  • , since the host is impossible to open door 3 (with car behind it).
  • Having these probabilities, the result follows by applying the definition of conditional probability and multiplication rule of probability, as in above.

Tree diagram.
Clipboard

Exercise.

1 Suppose the host opens the door randomly, such that the host is equally likely to open each door. Calculate the probability again.

None of the above.

2 Suppose there are doors instead of 3 doors. Without changing other given information, calculate the probability again.

None of the above.



Remark.

  • For other cases in which another door is picked, the same result holds by symmetry (notations can be changed in the expression).

Related results[edit | edit source]

Proposition. If and only if some events are independent, then they are still independent when part of them are changed to their complements.

Proof. We can prove it inductively. E.g., assume are independent. Then,

and similar results hold for other events.

Example. (Events that are pairwise independent but not independent (cont'd)) Recall the three events in a previous example.

  • be the event that the bigger ball is red;
  • be the event that the smaller ball is red;
  • be the event that both balls have the same color.

They are not independent in the condition in that example. It follows that and (namely the event that the two balls have different color, which is in an exercise for that example) are not independent.

Example. (Special cases for independence) A certain event is independent of arbitrary event. This also holds for a impossible event.

Proof.

  • The empty set is the impossible event, since .
  • For each event , .
  • Also, .
  • So, .
  • The sample space is the certain event, since .
  • Since , and for each event , it follows from the proposition about independence of complement events that .


Remark.

  • The meaning of this result is that knowledge of arbitrary event does not make a certain event less certain, and also does not make an impossible event possible, which is intuitive.

Conditional independence[edit | edit source]

Conditional independence is a conditional version of independence, and has the following definition which is similar to that of independence.

Definition. (Conditional independence) The events are conditionally independent given if

for each finite subset .

Remark.

  • In particular, if events and are conditionally independent given (assuming and ),
  • This means that knowing happens does not affect the occurrence or non-occurrence of .
  • In general, some events are conditionally independent given event neither implies nor is implied by that given event .
  • Conditional independence of some events neither implies nor is implied by independence of them. These two concepts are not related.

Example. Define

  • be the event that the birthday of Amy is June 1st;
  • be the event that the birthday of Bob is July 1st;
  • be the event that Amy and Bob are twins.

Events and are conditionally independent given , but not conditionally independent given . Also, events and are independent (unconditionally). (Assume, for simplicity, that the birthday of Amy and Bob is equally likely to be one of the 365 dates in a year (not including February 29th).)

Proof.

  • and are conditionally independent given since
  • ;
  • (there are equally likely (by principle of insufficient reason) distinct pairs of the birthdays).
  • and are not conditionally independent given since
  • ;
  • (twins must have the same birthday).
  • and are independent (unconditionally) since
  • ;
  • (there are equally likely (by principle of insufficient reason) distinct pairs of the birthdays).

Clipboard

Exercise.

Are and conditionally independent given (a) ; (b) ?

(a) Yes; (b) Yes.
(a) Yes; (b) No.
(a) No; (b) Yes.
(a) No; (b) No.




References and footnotes[edit | edit source]

  1. if we pick the door with a car behind it, then we win the car. We win nothing otherwise