# Probability/Important Distributions

The mean and variance of some distributions (which have simple formulas for mean and variance) will be discussed in the properties of distribution chapter. |

## Distributions of a discrete random variable[edit | edit source]

### Preliminary conept: Bernoulli trial[edit | edit source]

**Definition.**
(Bernoulli trial)
A *Bernoulli trial* is an experiment with only *two* possible outcomes, namely success and failure.

**Remark.**

- 'Success' and 'failure' are acting as labels only, i.e. we can define any one of two outcomes in the experiment as 'success'.

**Definition.**
(Independence of Bernoulli trials)
Let be the event ^{[1]}.
If are *independent*, then the corresponding Bernoulli trials is *independent*.

**Example.**
If we interpret the outcomes of tossing a coin as 'head comes up' and 'tail comes up', then tossing a coin is a Bernoulli trial.

**Exercise.**

**Remark.**

- We typically interpret the outcomes of tossing a coin as 'head comes up' and 'tail comes up'.

### Binomial distribution[edit | edit source]

#### Motivation[edit | edit source]

Consider independent Bernoulli trials with the same success probability . We would like to calculate to probability .

Let be the event , as in the previous section. Let's consider a particular sequence of outcomes such that there are successes in trials:

^{[2]}Since the probability of other sequences with some of successes occurring in other trials is the

*same*, and there are distinct possible sequences

^{[3]},

*binomial distribution*.

#### Definition[edit | edit source]

**Definition.**
(Binomial distribution)

A random variable follows the *binomial distribution* with independent Bernoulli trials and success probability , denoted by , if its pmf is

**Remark.**

- The "" in the pmf emphasizes that the values of
*parameters*of the distribution (which are quantities that describes the distribution) are and . We can similar notations to pdf.

- There are some alternative notations for emphasizing the parameter values. For example, when the parameter value is , then the pdf/pmf can be denoted by
- Of course, it is not necessary to adding these to the pdf/pmf, but it makes the parameter values involved explicit and clear.

- The pmf involves a
*binomial*coefficient, and hence the name '*binomial*distribution'. *General remark for each distribution*:

- We may also just write down the notation for the distribution to denote the distribution itself, e.g. stands for the binomial distribution.
- We sometimes say pmf, pdf, or support of a distribution, to mean pmf, pdf or support (respectively) of a random variable following that distribution, for simplicity (it also applies for other properties of distribution (discussed in a later chapter), e.g. mean, variance, etc.).

### Bernoulli distribution[edit | edit source]

Bernoulli distribution is simply a special case of *binomial* distribution, as follows:

**Definition.**
(Bernoulli distribution)

A random variable follows the *Bernoulli distribution* with success probability , denoted by , if its pmf is

**Remark.**

- .
- One
*Bernoulli*trial is involved, and hence the name '*Bernoulli*distribution'.

### Poisson distribution[edit | edit source]

#### Motivation[edit | edit source]

The Poisson distribution can be viewed as the 'limit case' for the binomial distribution.

Consider independent Bernoulli trials with success probability . By the binomial distribution,

After that, consider an unit time interval, with (positive) *occurrence rate* of a rare event (i.e. the *mean* of number of occurrence of the rare event is ). We can divide the unit time interval to time subintervals of time length each.
If is *large* and is *relatively small*, such that the probability for
occurrence of two or more *rare events* at a single time interval is negligible, then the probability for occurrence of *exactly one rare event*
for each time subinterval is by definition of mean.
Then, we can view the unit time interval as a sequence of Bernoulli trials ^{[4]} with success probability .
After that, we can use to model the number of occurrences of *rare event*. To be more precise,

*Poisson distribution*, and this result is known as the

*Poisson limit theorem*(or law of rare events). We will introduce it formally after introducing the definition of

*Poisson distribution*.

#### Definition[edit | edit source]

**Definition.**
(Poisson distribution)

A random variable follows the *Poisson distribution* with positive *rate parameter* , denoted by , if its pmf is

**Remark.**

- It is named after French mathematician Siméon Denis Poisson.

**Theorem.**
(Poisson limit theorem)
A random variable following *converges in distribution* to a random variable following as .

**Proof.**
The result follows from the result proved above: the pmf of approaches the pmf of as .

**Remark.**

- As a result, the Poisson distribution can be used as an approximation to the binomial distributions for large and relatively small .

### Geometric distribution[edit | edit source]

#### Motivation[edit | edit source]

Consider a sequence of independent Bernoulli trials with success probability . We would like to calculate the probability . By considering this sequence of outcomes:

^{[5]}This is the pmf of a random variable following the

*geometric distribution*.

#### Definition[edit | edit source]

**Definition.**
(Geometric distribution)

A random variable follows the *geometric distribution* with *success probability* , denoted by , if its pmf is

**Remark.**

- The sequence of the probabilities starting from , with input value increased one by one (i.e. ) is a
*geometric sequence*, and hence the name '*geometric*distribution'. - For an alternative definition, the pmf is instead , which is the proability , with support .

**Proposition.**
(Memorylessness of geometric distribution)
If , then

*nonnegative*integer and .

**Proof.**

- In particular, since .

**Remark.**

- can be interpreted as 'there are more than failures before the first success';
- can be interpreted as ' failures have occured, so there are more than or equal to failures before the first success'.
- It implies that the condition does
*not*affect the distribution of the*remaining*number of failures before the first success (it still follows geometric distribution with the same success probability). - So, we can assume the trials start
*afresh*after an arbitrary trial for which failure occurs.

- E.g., if failure occurs in first trial, then the distribution of the
*remaining*number of failures before the first success is not affected. - Also, if success occurs in first trial, then the condition becomes , instead of , so the above formula cannot be applied in this situation.

- Indeed, , since cannot exceed zero given that .

- E.g., if failure occurs in first trial, then the distribution of the

### Negative binomial distribution[edit | edit source]

#### Motivation[edit | edit source]

Consider a sequence of independent Bernoulli trials with success probability . We would like to calculate the probability . By considering this sequence of outcomes:

*same*, and there are (or , which is the same numerically) distinct possible sequences

^{[6]},

*negative binomial distribution*.

#### Definition[edit | edit source]

**Definition.**
(Negative binomial distribution)

A random variable follows the *negative binomial distribution* with *success probability* , denoted by , if its pmf is

**Remark.**

*Negative binomial*coefficient is involved and hence the name '*negative binomial*distribution'.

### Hypergeometric distribution[edit | edit source]

#### Motivation[edit | edit source]

Consider a sample of size are drawn without replacement from a population size , containing objects of type 1 and of another type. Then, the probability

^{[7]}.

- : unordered selection of objects of type 1 from (distinguishable) objects of type 1 without replacement;
- : unordered selection of objects of another type from (distinguishable) objects of another type without replacement;
- : unordered selection of objects from (distinguishable) objects without replacement.

This is the pmf of a random variable following the *hypergeometric distribution*.

#### Definition[edit | edit source]

**Definition.** (Hypergeometric distribution)

A random variable follows the *hypergeometric distribution* with objects drawn from a collection of objects of type 1 and of another type, denoted by , if its pmf is

**Remark.**

- The pmf is sort of similar to
*hypergeometric*series^{[8]}, and hence the name '*hypergeometric*distribution'.

### Finite discrete distribution[edit | edit source]

This type of distribution is a generalization of all discrete distribution with finite support, e.g. Bernoulli distribution and hypergeometric distribution.

Another special case of this type of distribution is *discrete uniform distribution*, which is similar to the *continuous uniform distribution* (will be discussed later).

**Definition.**
(Finite discrete distribution)
A random variable follows the *finite discrete distribution* with vector and probability vector ,
denoted by if its pmf is

**Remark.**

- For mean and variance, we can calculate them by definition directly. There are no special formulas for finite discrete distribution.

**Definition.**
(Discrete uniform distribution)
The *discrete uniform distribution*, denoted by , is .

**Remark.**

- Its pmf is

**Example.**
Suppose a r.v. .
Then,

| | * | | | * | | * | | | | | | *----*----*----*------- 1 2 3

**Example.**
Suppose a r.v. . Then,

| | | | * * * | | | | | | | | *----*----*----*------- 1 2 3

### Exercises[edit | edit source]

**Exercise.**

## Distributions of a continuous random variable[edit | edit source]

### Uniform distribution (continuous)[edit | edit source]

The *continuous uniform distribution* is a model for 'no preference',
i.e. all intervals of the same length on its support are *equally likely* ^{[9]} (it can be seen from the pdf corresponding to continuous uniform distribution).
There is also *discrete* uniform distribution, but it is less important than *continuous* uniform distribution.
So, from now on, simply 'uniform distribution' refers to the *continuous* one, instead of the discrete one.

**Definition.**
(Uniform distribution)

A random variable follows the *uniform distribution*, denoted by , if its pdf is

**Remark.**

- The support of can also be alternatively or , without affecting the probabilities of events involved, since the probability calculated, using pdf at a
*single*point, is*zero*anyways. - The distribution is the
*standard uniform distribution*.

**Proposition.**

(Cdf of uniform distribution) The cdf of is

**Proof.**

### Exponential distribution[edit | edit source]

The *exponential* distribution with *rate* parameter is often used to describe the *interarrival time* of rare events with rate .

Comparing this with the *Poisson* distribution, the *exponential* distribution describes the interarrival *time* of rare events,
while *Poisson* distribution describes the *number* of occurrences of rare events within a fixed time interval.

By definition of *rate*, when the *rate* , then *interarrival time* (i.e. frequency of the rare event ).

So, we would like the pdf to be more skewed to left when (i.e. the pdf has higher value for small when ), so that areas under the pdf for intervals involving small value of when .

Also, since with a fixed rate , the interarrival time should be less likely of higher value. So, intuitively, we would also like the pdf to be a strictly *decreasing* function, so that the probability involved (area under the pdf for some interval) when .

As we can see, the pdf of exponential distribution satisfies both of these properties.

**Definition.**
(Exponential distribution)

A random variable follows the *exponential distribution* with positive *rate* parameter , denoted by , if its pdf is

**Proposition.**
(Cdf of exponential distribution)

The cdf of is

**Proof.**
Suppose . The cdf of is

**Proposition.**
(Memorylessness of exponential distribution)
If , then

*nonnegative*number and .

**Proof.**

**Remark.**

- can be interpreted as 'the rare event will not occur within next units of time';
- can be interpreted as 'the rare event has not occurred for past units of time'.
- It implies that the condition does not affect the distribution of the
*remaining*waiting time for the rare event (it still follows exponential distribution with the same parameter). - So, we can assume the arrival process of the event starts
*afresh*at arbitrary time point of observation.

### Gamma distribution[edit | edit source]

*Gamma* distribution is a generalized *exponential* distribution, in the sense that we can also change the *shape* of the pdf of *exponential* distribution.

**Definition.**
(Gamma distribution)

A random variable follows the *gamma distribution* with positive *shape* parameter and positive *rate* parameter , denoted by , if its pdf is

**Remark.**

- , since the pdf of

- which is the pdf of .

### Beta distribution[edit | edit source]

*Beta* distribution is a generalized , in the sense that we can also change the *shape* of the pdf, using *two shape parameters*.

**Definition.**
(Beta distribution)

A random variable follows the *beta distribution* with positive shape parameters and , denoted by , if its pdf is

**Remark.**

- , since the pdf of is

- which is the pdf of .

### Cauchy distribution[edit | edit source]

The *Cauchy* distribution is a *heavy-tailed* distribution ^{[10]}.
As a result, it is a 'pathological' distribution, in the sense that it has some counter-intuitive properties, e.g. undefined mean and variance, despite its mean and variance *seems* to be defined when we look at its graph directly.

**Definition.**
(Cauchy distribution)

A random variable follows the *Cauchy distribution* with *location* parameter , denoted by , if its pdf is

**Remark.**

- This definition is referring to a
*special case*of Cauchy distribution. To be more precise, there is also the*scale*parameter in the complete definition of Cauchy distribution, and it is set to be one in the pdf here.

- This definition is used here for simplicity.

- The pdf is symmetric about , since .

### Normal distribution (very important)[edit | edit source]

The normal or Gaussian distribution is a thing of beauty, appearing in many places in nature. This is probably because sample means or sample sums often follow *normal* distributions *approximately*
by *central limit theorem*.
As a result, the *normal* distribution is important in statistics.

**Definition.**
(Normal distribution)

A random variable follows the *normal distribution* with *mean* and *variance* ,
denoted by , if its pdf is

**Remark.**

- The distribution is the
*standard*normal distribution.

- For , its pdf is often denoted by , and its cdf is often denoted by .
- pdf of is .
- It follows that the pdf of is .

- It will be proved that is actually the
*mean*, and is actually the*variance*. - The pdf is symmetric about , since .

**Proposition.**
(Distributions for linear transformation of normally distributed random variables)
If , and
and are constants,
.

**Proof.**
Assume ^{[11]}.
Let and be cdf of and respectively.
Since

**Remark.**

- A special case is when and , since
- ;
- .
- This shows that we can transform each normally distributed r.v. to the r.v. following standard normal distribution.
- This can ease the calculation for the probability relating the normally distributed r.v., since we have the
*standard normal table*, in which values of at different are given. - For some types of standard normal table, only the values of at different
*nonnegative*are given. - Then, we can calculate its values at different negative using

- This formula holds since

### Important distributions for statistics especially[edit | edit source]

The following distributions are important in statistics especially, and they are all related to normal distribution. We will introduce them briefly.

#### Chi-squared distribution[edit | edit source]

The *chi-squared* distribution is a special case of Gamma distribution, and also related to *standard normal* distribution.

**Definition.**
(Chi-squared distribution)

The *chi-squared* distribution with positive degrees of freedom, denoted by ,
is the distribution of , in which are i.i.d., and they all follow .

**Remark.**

- It can be proved that and thus . (Then, we can deduce the pdf of through this.)
- This implies for the random variable , .
- A random variable follows the
*chi-squared*distribution with degrees of freedom is denoted by .

#### Student's *t*-distribution[edit | edit source]

The *Student's -distribution* is related to *chi-squared* distribution and *normal* distribution.

**Definition.**
(Student's -distribution)

The *Student's -distribution* with degrees of freedom, denoted by , is the distribution of in which and .

**Remark.**

- and (the is extended real number).
- The tails of the pdf is heavier as .
- A random variable follows the
*(Student's )-distribution*with degrees of freedom is denoted by . - It can be proved that the pdf of is

*F*-distribution[edit | edit source]

The -distribution is sort of a generalized Student's -distribution, in the sense that it has one more changeable parameter for another degrees of freedom.

**Definition.**
(-distribution)
The *-distribution* with and degrees of freedom, denoted by ,
is the distribution of in which and .

**Remark.**

- .
- A random variable following the
*-distribution*with and degrees of freedom is denoted by . - It can be proved that the pdf of is

If you are interested in knowing how *chi-squared distribution*, *Student's -distribution*, and *-distribution* are useful in statistics,
then you may briefly look at, for instance, Statistics/Interval Estimation (applications in confidence interval construction) and Statistics/Hypothesis Testing (applications in hypothesis testing).

## Joint distributions[edit | edit source]

This section requires the knowledge of joint distributions. |

### Multinomial distribution[edit | edit source]

#### Motivation[edit | edit source]

Multinomial distribution is *generalized* binomial distribution,
in the sense that each trial has more than two outcomes.

Suppose objects are to be allocated to cells independently,
for which each object is allocated to *one and only one* cell, with probability to be allocated to the th cell () ^{[12]}.
Let be the number of objects allocated to cell .
We would like to calculate the probability , i.e.
the probability that th cell has objects.

We can regard each allocation as an independent trial with outcomes (since it can be allocated to one and only one of cells). We can recognize that the allocation of objects is partition of objects into groups. There are hence ways of allocation.

So, In particular, the probability of allocating objects to th cell is by independence, and so that of a particular case of allocation of objects to cells is by independence.

#### Definition[edit | edit source]

**Definition.**
(Multinomial distribution)
A random *vector* follows the *multinomial distribution* with trials and probability vector ,
denoted by , if its joint pmf is

**Remark.**

- if .

- In this case, if , is the number of successes for the binomial distribution (and is the number of failures).

- Also, . It can be seen by regarding allocating the object into th cell as 'success' for each allocation of single object
^{[13]}. Then, the success probability is .

### Multivariate normal distribution[edit | edit source]

*Multivariate* normal distribution is, as suggested by its name, a multivariate (and also generalized) version of the normal distribution (univariate).

**Definition.**
(Multivariate normal distribution)
A random *vector* follows the *-dimensional normal distribution*
with *mean vector* and *covariance matrix* , denoted by ^{[14]} if its joint pdf is

*mean vector*, and is the

*covariance matrix*(with size ).

**Remark.**

- The distribution for case is more usually used, and that is called the
*bivariate normal*distribution. - An alternative and equivalent definition is that if

- for some constants , and are i.i.d. standard normal random variables.

- Using the above result, the
*marginal*distribution followed by is , as one will expect.

- By proposition about the sum of independent normal random variables and distribution of linear transformation of normal random variables (see Probability/Transformation of Random Variables chapter), the mean is , and the variance is (this equals by definition).

Conditional mean and variance of bivariate normal distribution will be discussed in the Probability/Conditional Distributions chapter. |

**Proposition.**
(Joint pdf of the bivariate normal distribution)
The joint pdf of is

**Proof.**
For the bivariate normal distribution,

- the
*mean vector*is ; - the
*covariance matrix*is - Hence,

- It follows that the joint pdf is

- ↑ Alternatively, we can define the events as
- ↑ 'indpt.' stands for independence.
- ↑ This is because there is unordered selection of (distinguishable and ordered) trials for 'success' without replacement from trials (then the remaining position is for 'failure').
- ↑ Occurrence of the rare event is viewed as 'success' and non-occurrence of the rare event is viewed as 'failure'.
- ↑ Unlike the outcomes for the binomial distribution, there is only
*one*possible sequence for each . - ↑ There is unordered selection of trials for 'failures' (or trials for 'successes') from trials without replacement
- ↑ The restriction on is imposed so that the binomial coefficients are defined, i.e. the expression 'makes sense'. In practice, we rarely use this condition directly. Instead, we usually directly determine whether a specific value of 'makes sense'.
- ↑ It is out of scope for this book.
- ↑ The probability is 'distributed uniformly over an interval'.
- ↑ A random variable following the
*Cauchy*distribution has a relatively high probability to take*extreme values*, compared with other*light-tailed*distributions (e.g. the normal distribution). Graphically, the 'tails' (i.e. left end and right end) of the pdf. - ↑ The case for holds similarly (The inequality sign is in opposite direction, and eventually we will have two negative signs cancelling each other). Also when , the r.v. becomes a non-random constant, and so we are not interested in this case.
- ↑ Then, .
- ↑ If the object is allocated to a cell other than th cell, then it is 'failure'
- ↑ The subscript for is to emphasize that the distribution is -dimensional, and is optional.