Probability/Conditional Distributions

From Wikibooks, open books for an open world
Jump to navigation Jump to search



Motivation[edit | edit source]

Suppose there is an earthquake. Let be the number of casualties and be the Richter scale of the earthquake.

(a) Without given anything, what is the distribution of ?

(b) Given that , what is the distribution of ?

(c) Given that , what is the distribution of ?

Remark.

  • means the earthquake is micro, and means the earthquake is great.

Are your answers to (a),(b),(c) different?

In (b) and (c), we have the conditional distribution of given , and the conditional distribution of given respectively.

In general, we have conditional distribution of given (before observing the value of ), or given (after observing the value of ).

Conditional distributions[edit | edit source]

Recall the definition of conditional probability:

in which are events, with . Applying this definition to discrete random variables , we have
where is the joint pmf of and , and is the marginal pmf of . It is natural to call such conditional probability as conditional pmf, right? We will denote such conditional probability as . Then, this is basically the definition of conditional pmf: conditional pmf of given is the conditional probability . Naturally, we will expect that conditional pdf is defined similarly. This is indeed the case:

Definition. (Conditional probability function) Let be random variables that are both discrete or both continuous. The conditional probability (mass or density) function of given , in which is a real number, is

Remark.

  • The marginal pdf can be interpreted as normalizing constant, which makes the integral , since (integrating over the region in which is fixed to be (the region in which the condition is satisfied), so we only integrate over the corresponding interval of ( is still a variable)).
  • This is similar to the denominator in the definition of conditional probability, which makes the conditional probability of the whole sample space equals one, to satisfy the probability axiom.

To understand the definition more intuitively for the continuous case, consider the following diagram.

Top view:
     
        |
        |
        *---------------* 
        |               |
        |               |
fixed y *===============* <--- corresponding interval
        |               |
        |               |
        *---------------*
        |
        *---------------- x

Side view:

          *  
         / \ 
        *\  *  /                                           
       /|#\   \
   |  / |##\ / *---------*
   | *  |###\            /\
   | |\ |##/#\----------/--\     
   | | \|#/###*--------*   /                             
   | |  \/############/#\ /                              
   | |y *\===========/===*                               
   | | /  *---------*   /                                
   | |/              \ /                                 
   | *----------------*                                  
   |/                                                    
   *------------------------- x                          


Front view:
             
    |
    |
    |               
    *\     
    |#\    
    |##\   
    |###\             
    |####\   <------ Area: f_Y(y)
    |#####*--------*  
    |###############\ 
    *================*-------------- x

*---*
|###| : corresponding cross section from joint pdf
*---*   

We can see that when we are conditioning , we take a "slice" out from the region under joint pdf, and the area of the "whole slice" is the area between the univariate joint pdf with fixed and variable , and the -axis. Since the area is given by , while according to the probability axioms, the area should equal 1. Hence, we scale down the area of "slice" by a factor of , by dividing the univariate joint pdf by . After that, the curve at the top of scaled "slice" is the graph of the conditional pdf .

Now, we have discussed the case where both random variables are discrete or continuous. How about the case where one of them is discrete and another one is continuous? In this case, there is no "joint probability function" of these two random variables, since one is discrete and another is continuous! But, we can still define the conditional probability function in some other ways. To motivate the following definition, let be the conditional probability . Then, differentiating with respect to should yield the conditional pdf . So, we have

Thus, it is natural to have the following definition.

Definition. (Conditional probability density function when is continuous and is discrete) Let be a continuous random variable and be a discrete random variable. The conditional probability density function of given , where is real number, is

Now, how about the case where is discrete and is continuous? In this case, let us use the above definition for the motivation of definition. However, we should interchange and so that the assumptions are still satisfied. Then, we get

In this case, is discrete, so it is natural to define the conditional pmf of given as in the expression. Now, after rearranging the terms, we get
Thus, we have the following definition.

Definition. (Conditional probability mass function when is discrete and is continuous) Let be a discrete random variable and be a continuous random variable. The conditional probability density function of given , where is real number, is

Based on the definitions of conditional probability functions, it is natural to define the conditional cdf as follows.

Definition. (Conditional cumulative distribution function) Let be discrete or continuous random variables. The conditional cumulative distribution function (cdf) of given , in which is a real number, is

Remark.

  • We should be aware that when is continuous, the event has probability zero. So, according to the definition of conditional probability, the conditional cdf in this case should be undefined. However, in this context, we still define the conditional probability as an expression that makes sense and is defined.

Graphical illustration of the definition (continuous random variables):

Top view:
     
        |
        |
        *---------------* 
        |               |
        |               |
fixed y *=========@=====* <--- corresponding interval
        |         x     |
        |               |
        *---------------*
        |
        *---------------- 

Side view:

          *  
         / \ 
        *\  *  /                                           
       /|#\   \
   |  / |##\ / *---------*
   | *  |###\            /\
   | |\ |##/#\----------/--\     
   | | \|#/###*--------*   /                             
   | |  \/#########   / \ /                              
   | |y *\========@==/===*                               
   | | /  *-------x-*   /                                
   | |/              \ /                                 
   | *----------------*                                  
   |/                                                    
   *------------------------- x                          


Front view:

    |
    |
    |
    *\      
    |#\    
    |##\              
    |###\             
    |####\   <------------- Area: f_Y(y)         
    |#####*--------*  
    |###########    \ 
    *==========@=====*--------------  
               x
*---*
|###| : the desired region from the cross section from joint pdf, whose area is the probability from the cdf
*---*   

If for some event , we have some special notations for simplicity:

  • the conditional probability function of given becomes

  • the conditional cdf of given becomes

Proposition. (Determining independence of two random variables) Random varibles are independent if and only if for each .

Proof. Recall the definition of independence between two random variables:

are independent if

for each .

Since

for each , we have the desired result.

Remark.

  • This is expected, since the conditioning on independent event should not affect the occurrence of another independent event.


We can extend the definition of conditional probability function and cdf to groups of random variables, for joint cdf's and joint probability functions, as follows:

Definition. (Conditional joint probability function) Let and be two random vectors. The conditional joint probability function of given is

Then, we also have a similar proposition for determining independence of two random vectors.

Proposition. (Determining independence of two random vectors) Random vectors are independent if and only if for each .

Proof. The definition of independence between two random vectors is

  • are independent if

for each .

Since

for each , we have the desired result.

Conditional distributions of bivariate normal distribution[edit | edit source]

Recall from the Probability/Important Distributions chapter that the joint pdf of is

, and and in this case. in which and are positive.

Proposition. (Conditional distributions of bivariate normal distribution) Let . Then,

(abuse of notations: when we say the distribution of "", we mean the conditional distribution of given ).

Proof.

  • First, the conditional pdf

  • Then, we can see that ,
  • and by symmetry (interchanging and , and also interchanging and ), .

Conditional version of concepts[edit | edit source]

We can obtain conditional version of concepts previously established for 'unconditional' distributions analogously for conditional distributions by substituting 'unconditional' cdf, pdf or pmf, i.e. or , by their conditional counterparts, i.e. or .

Conditional independence[edit | edit source]

Definition. Random variables are conditionally independent given if and only if

or
. for each real number and for each positive integer , in which and denote the joint cdf and probability function of conditional on respectively.

Remark.

  • For random variables, conditional independence and independence are not related, i.e. one of them does not imply the another.

Example. (Conditional independence does not imply independence) TODO

Example. (Independence does not imply conditional independence) TODO

Conditional expectation[edit | edit source]

Definition. (Conditional expectation) Let be the conditional probability function of given . Then,

Remark.

  • is a function of
  • the random variable , which is a function of after computing the expectation, is written as for brevity, in which 's are the same term.
  • is a realization of when is observed to be in which 's are the same term.

Similarly, we have conditional version of law of the unconscious statistician.

Proposition. (Law of the unconscious statistician (conditional version)) Let be the conditional probability function of given . Then, for each function ,

Proposition. (Conditional expectation under independence) If random variables are independent,

for each function .

Proof.

Remark.

  • This equality may not hold if are not independent.

Example. Suppose random vector in which are independent random variables, and . Then,

( is treated as constant, because of the conditioning: it is constant after realization of ) but

The properties of still hold for conditional expectations , with every 'unconditional' expectation replaced by conditional expectation and some suitable modifications, as follows:

Proposition. (Properties of conditional expectation) For each random variable ,

  • (linearity)
for each functions of and for each random variable
  • (nonnegativity) if ,
  • (monotonicity) if , for each random variable
  • (triangle inequality)

  • (multiplicativity under independence) if are conditionally independent given ,

Proof. The proof is similar to the one for 'unconditional' expectations.

Remark.

  • are treated as constants given , since after observing the value of , they cannot be changed.
  • Each result also holds with replaced by random vectors .

The following theorem about conditional expectation is quite important.

Theorem. (Law of total expectation) For each function and for each random variable ,

Proof.

Remark.

  • We can replace by and get

Corollary. (Generalized law of total probability) For each event ,

Proof.

  • First,

  • Then, using law of total expectation,

Remark.

  • The expectation is taken with respect to , so we use the notation. We will use similar notations to denote the random variables to which the expectation is taken with respect if needed.
  • We can replace by , which is a random vector.
  • If is discrete, then the expanded form of the result is (discrete case for law of total probability).
  • If is continuous, then the expanded form of the result is (continuous case for law of total probability).

Corollary. (Expectation version of law of total probability) Suppose the sample space in which 's are mutually exclusive. Then,

Proof. Define if occurs, in which is a positive integer. Then,

Remark.

  • the number of events can be finite, as long as they are mutually exclusive and their union is the whole sample space
  • if , it reduces to law of total probability

Example. Let be the human height in m. A person is randomly selected from a population consisting of same number of men and women. Given that the mean height of a man is 1.8 m, and that of a woman is 1.7m, the mean height of the entire population is

Corollary. (formula of expectation conditional on event) For each random variable and event with ,

Proof. By the formula of expectation computed by weighted average of conditional expectations,

and the result follows if .

Remark.

  • if , it reduces to the definition of the conditional probability by the fundamental bridge between probability and expectation

After defining conditional expectation, we can also have conditional variance, covariance and correlation coefficient, since variance, covariance, and correlation coefficient are built upon expectation.

Conditional expectations of bivariate normal distribution[edit | edit source]

Proposition. (Conditional expectations of bivariate normal distribution) Let . Then,

Proof.

  • The result follows from the proposition about conditional distributions of bivariate normal distribution readily.


Conditional variance[edit | edit source]

Definition. (Conditional variance) The conditional variance of random variable given is

Similarly, we have properties of conditional variance which are similar to that of variance.

Proposition. (Properties of conditional variance) For each random variable ,

  • (alternative formula of conditional variance)
  • (invariance under change in location parameter)
  • (homogeneity of degree two)
  • (nonnegativity)
  • (zero variance implies non-randomness) for some function of
  • (additivity under independence) if are conditionally independent given ,

Proof. The proof is similar to the one for properties of variance.

Beside law of total expectation, we also have law of total variance, as follows:

Proposition. (Law of total variance) For each rnadom variable ,

Proof.

Remark.

  • We can replace by , a random vector.

Conditional variances of bivariate normal distribution[edit | edit source]

Proposition. (Conditional variances of bivariate normal distribution) Let . Then,

Proof.

  • The result follows from he proposition about conditional distributions of bivariate normal distribution readily.

Remark.

  • It can be observed that the exact values of and in the conditions do not matter. The result is the same for different values of them.


Conditional covariance[edit | edit source]

Definition. (Conditional covariance) The conditional covariance of and given is

Proposition. (Properties of conditional covariance)

(i) (symmetry) for each random variable ,

(ii) for each random variable ,
(iii) (alternative formula of covariance)
(iv) for each constant , and for each random variables ,
(v) for each random variable ,


Conditional correlation coefficient[edit | edit source]

Definition. (Conditional correlation coefficient) The conditional correlation coefficient of random variables and given is

Remark.

  • Similar to 'unconditional' correlation coefficient, conditional correlation coefficient also lies between and inclusively. The proof is similar, by replacing every unconditional terms with conditional terms.


Conditional quantile[edit | edit source]

Definition. (Conditional quantile) The conditional th quantile of given is

Remark.

  • Then, we can have conditional median, interquartile range, etc., which are defined using conditional quantile in the same way as the unconditional ones