Probability/Transformation of Random Variables

From Wikibooks, open books for an open world
Jump to navigation Jump to search


Transformation of random variables[edit | edit source]

Underlying principle[edit | edit source]

Let be random variables, be another random variables, and be random (column) vectors.

Suppose the vector-valued function[1] is bijective (it is also called one-to-one correspondence in this case). Then, its inverse exists.

After that, we can transform to by applying the transformation , i.e. by , and transform to by applying the inverse transformation , i.e. by .

We are often interested in deriving the joint probability function of , given the joint probability function of . We will examine the discrete and continuous cases one by one in the following.

Transformation of discrete random variables[edit | edit source]

Proposition. (transformation of discrete random variables) For each discrete random vector with joint pmf , the corresponding joint pmf of the transformed random vector where is bijective is

Proof. Considering the original pmf , we have

In particular, the inverse exists since is bijective.


Transformation of continuous random variables[edit | edit source]

For continuous random variables, the situation is more complicated.

Let us investigate the case for univariate pdf, which is simpler.

Theorem. (Transformation of continuous random variable (univariate case)) Let be a continuous random variable with pdf . Assume that the function is differentiable and strictly monotone. Then, the pdf of the transformed random variable is

Proof. Under the assumption that is differentiable and strictly monotone, the cdf ( exists since is strictly monotonic.) Differentiating both side of the above equation (assuming the cdf's involved are differentiable) gives

Since , we can write as . Also, we can summarize the above case defined function into a single expression by applying absolute value function to both side:
where the absolute value sign is only applied to since the pdf's must be nonnegative, and thus we do not need to apply the sign to them.

Remark.

  • To explain this theorem in a more intuitive manner, we rewrite the equation in the theorem as

where both side of the equation can be regarded as differential areas, which are nonnegative due to the absolute value signs.
  • This equation should intuitively hold since they both represent the areas under the pdf's, which represent probabilities. For , it is the area of the region under the pdf of over an "infinitesimal" interval , which represent the probability for to lie in this infinitesimal interval . After transformation, we get another pdf of , and the original region is transformed to a region under pdf of over an infinitesimal interval with area . Since is bijective function (its strict monotonicity implies this), "correspond" to in some sense, and we know that the values in are "originated" from the values in , and so the randomness. It follows that the probability for lying in and lying in should be the same, and hence the two differential areas are the same.

Let us define Jacobian matrix, and introduce several notations in the definition.

Definition. (Jacobian matrix) Suppose the function is differentiable (then it follows that is differentiable). The Jacobian matrix

in which is the component function of for each , i.e. .

Remark.

  • We have .

Example. Suppose , , and . Then, ,, and

Also, . Then, , , and


Theorem. (Transformation of continuous random variables) Let be a continuous random vector with joint pdf , and assume is differentiable and bijective. The corresponding joint pdf of transformed random vector is

Proof. Partial proof: Assume is differentiable and bijective.

First,

On the other hand, we have

where , which is the preimage of the set under .

Applying the change of variable formula to this integral (whose proof is advanced and uses our assumptions), we get

Comparing the integrals in and , we can observe the desired result.


Moment generating function[edit | edit source]

Definition. (Moment generating function) The moment generating function (mgf) for the distribution of a random variable is .

Remark.

  • For comparison: cdf is .
  • Mgf, similar to pmf, pdf and cdf, gives a complete description of distribution, so it can also similarly uniquely identify a distribution, provided that the mgf exists (expectation may be infinite),
  • i.e., we can recover probability function from mgf.
  • The proof to this result is complicated, and thus omitted.

Proposition. (Moment generating property of mgf) Assuming mgf exists for in which is a positive number, we have

for each nonnegative integer .

Proof.

  • Since

  • The result follows from simplifying the above expression by

Proposition. (Relationship between independence and mgf) If and are independent,

Proof.

Similarly,

  • lote: law of total expectation

Remark.

  • This equality does not hold if and are not independent.

Joint moment generating function[edit | edit source]

In the following, we will use to denote .

Definition. (Joint moment generating function) The joint moment generating function (mgf) of random vector is

for each (column) vector , if the expectation exists.

Remark.

  • When , the dot product of two vectors is product of two numbers.
  • .

Proposition. (Relationship between independence and mgf) Random variables are independent if and only if

Proof. 'only if' part: Assume are independent. Then,

Proof for 'if' part is quite complicated, and thus is omitted.

Analogously, we have marginal mgf.

Definition. (Marginal mgf) The marginal mgf of which is a member of random variables is

Proposition. (Moment generating function of linear transformation of random variables) For each constant vector and a real constant , the mgf of is

Proof.

Remark.

  • If are independent,

  • This provides an alternative, and possibly more convenient method to derive the distribution of , compared with deriving it from probability functions of .
  • Special case: if and , then , which is sum of r.v.'s.
  • So, .
  • In particular, if are independent , then .
  • We can use this result to prove the formulas for sum of independent r.v.'s., instead of using the proposition about convolution of r.v.'s.
  • Special case: if , then the expression for linear transformation becomes .
  • So, .


Moment generating function of some important distributions[edit | edit source]

Proposition. (Moment generating function of binomial distribution) The moment generating function of is .

Proof.

Proposition. (Moment generating function of Poisson distribution) The moment generating function of is .

Proof.

Proposition. (Moment generating function of exponential distribution) The moment generating function of is .

Proof.

  • The result follows.

Proposition. (Moment generating function of gamma distribution) The moment generating function of is .

Proof.

  • We use similar proof technique from the proof for mgf of exponential distribution.

  • The result follows.

Proposition. (Moment generating function of normal distribution) The moment generating function of is .

Proof.

  • Let . Then, .
  • First, consider the mgf of :

  • It follows that the mgf of is

  • The result follows.


Distribution of linear transformation of random variables[edit | edit source]

We will prove some propositions about distributions of linear transformation of random variables using mgf. Some of them are mentioned in previous chapters. As we will see, proving these propositions using mgf is quite simple.

Proposition. (Distribution of linear transformation of normal r.v.'s) Let . Then, .

Proof.

  • The mgf of is

which is the mgf of , and the result follows since mgf identify a distribution uniquely.

Sum of independent random variables[edit | edit source]

Proposition. (Sum of independent binomial r.v.'s) Let , in which are independent. Then, .

Proof.

  • The mgf of is

which is the mgf of , as desired.

Proposition. (Sum of independent Poisson r.v.'s) Let , in which are independent. Then, .

Proof.

  • The mgf of is

which is the mgf of , as desired.

Proposition. (Sum of independent exponential r.v.'s) Let be i.i.d. r.v.'s following . Then, .

Proof.

  • The mgf of is

which is the mgf of , as desired.

Proposition. (Sum of independent gamma r.v.'s) Let , in which are independent. Then, .

Proof.

  • The mgf of is

which is the mgf of , as desired.

Proposition. (Sum of independent normal r.v.'s) Let , in which are independent. Then .

Proof.

  • The mgf of (in which they are independent) is

which is the mgf of , as desired.


Central limit theorem[edit | edit source]

We will provide a proof to central limit theorem (CLT) using mgf here.

Theorem. (Central limit theorem) Let be a sequence of i.i.d. random variables with finite mean and positive variance , and be the sample mean of the first random variables, i.e. . Then, the standardized sample mean converges in distribution to a standard normal random variable as .

Proof.

  • Define . Then, we have

  • which is in the form of .
  • Therefore,

and the result follows from the mgf property of identifying distribution uniquely.

Remark.

  • Since ,
  • the sample mean converges in distribution to as .
  • The same result holds for the sample mean of normal r.v.'s with the same mean and the same variance ,
  • since if , then .
  • It follows from the proposition about the distribution of linear transformation of normal r.v.'s that the sample sum, i.e. converges in distribution to .
  • The same result holds for the sample sum of normal r.v.'s with the same mean and the same variance ,
  • since if , then .
  • If a r.v. converges in distribution to a distribution, then we can use the distribution to approximate the probabilities involving the r.v..

A special case of using CLT as approximation is using normal distribution to approximate discrete distribution. To improve accuracy, we should ideally have continuity correction, as explained in the following.

Proposition. (Continuity correction) A continuity correction is rewriting the probability expression ( is integer) as when approximating a discrete distribution by normal distribution using CLT.

Remark.

  • The reason for doing this is to make to be at the 'middle' of the interval, so that it is better approximated.

Illustration of continuity correcction:

| 
|              /
|             /
|            /
|           /|
|          /#|
|         *##|
|        /|##|
|       /#|##|   
|      /##|##|   
|     /|##|##|   
|    / |##|##|   
|   /  |##|##|
|  /   |##|##|
| /    |##|##|
*------*--*--*---------------------
    i-1/2 i i+1/2

| 
|              /
|             /
|            /
|           / 
|          /  
|         *   
|        /|   
|       /#|      
|      /##|      
|     /###|      
|    /####|      
|   /#####|   
|  /|#####|   
| / |#####|   
*---*-----*------------------------
   i-1    i      

| 
|              /|
|             /#|
|            /##|
|           /###|
|          /####|
|         *#####|
|        /|#####|
|       / |#####|
|      /  |#####|
|     /   |#####|
|    /    |#####|
|   /     |#####| 
|  /      |#####|
| /       |#####|
*---------*-----*------------------
          i     i+1 
  1. or equivalently, transformation between supports of and