# Econometric Theory/Asymptotic Convergence

## Asymptotic Convergence

### Modes of Convergence

#### Convergence in Probability

Convergence in probability is going to be a very useful tool for deriving asymptotic distributions later on in this book. Alongside convergence in distribution it will be the most commonly seen mode of convergence.

##### Definition

A sequence of random variables $\{ X_n ; n=1,2, \cdots \}$ converges in probability to $X_{ }$ if:

 $\forall \epsilon, \delta >0,$ $\exists N \; \operatorname{s.t.} \; \forall n \geq N,$ $\Pr \{ |X_n - X| > \delta \}< \epsilon$

an equivalent statement is:

 $\forall \delta >0,$ $\lim_{n \to \infty} \Pr \{ |X_n - X| > \delta \}=0$

This will be written as either $X_n \begin{matrix} \begin{matrix} { }_p \\ \longrightarrow \\{ } \end{matrix} \end{matrix} X$ or $\operatorname{plim} X_n = X$.

##### Example

$X_n = \begin{cases} \eta & 1- \begin{matrix} \frac{1}{n} \end{matrix} \\ \theta & \begin{matrix} \frac{1}{n} \end{matrix} \end{cases}$

We'll make an intelligent guess that this series converges in probability to the degenerate random variable $\eta$. So we have that:

$\forall \delta >0,\; \Pr \{ |X_n - \eta| > \delta \} \leq \Pr \{ |X_n - \eta| > 0 \}= \Pr \{ X_n= \theta \}= \begin{matrix} \frac{1}{n} \end{matrix}$

Therefore our definition for convergence in probability in this case is:

 $\forall \epsilon , \delta >0,$ $\exists N \quad \operatorname{s.t.} \forall n \geq N,$ $\Pr \{ |X_n - \eta | > \delta \} \leq \Pr \{ |X_n - \eta | > 0 \}=\Pr \{ X_n= \theta \}= \begin{matrix} \frac{1}{n} \end{matrix} < \epsilon$

So for any positive values of $\epsilon \in \mathbb{R}$ we can always find an $N \in \mathbb{N}$ large enough so that our definition is satisfied. Therefore we have proved that $X_n \begin{matrix} { }_p \\ \longrightarrow \\{ } \end{matrix} \eta$.

#### Convergence Almost Sure

Almost-sure convergence has a marked similarity to convergence in probability, however the conditions for this mode of convergence are stronger; as we will see later, convergence almost surely actually implies that the sequence also converges in probability.

##### Definition

A sequence of random variables $\{ X_n ; n=1,2, \cdots \}$ converges almost surely to the random variable $X$ if:

 $\forall \delta >0,$ $\lim_{n \to \infty} \Pr \{ \bigcup_{m \geq n} |X_m - X| > \delta, \}=0$

equivalently

 $\Pr \{ \lim_{n \to \infty} X_n = X \}=1$

Under these conditions we use the notation $X_n \begin{matrix} \begin{matrix} { }_{a.s.} \\ \longrightarrow \\{ } \end{matrix} \end{matrix} X$ or $\lim_{n \to \infty} X_n = X \operatorname{a.s.}$.

##### Example

Let's see if our example from the convergence in probability section also converges almost surely. Defining:

$X_n = \begin{cases} \eta & 1- \begin{matrix} \frac{1}{n} \end{matrix} \\ \theta & \begin{matrix} \frac{1}{n} \end{matrix} \end{cases}$

we again guess that the convergence is to $\eta$. Inspecting the resulting expression we see that:

 $\Pr \{ \lim_{n \to \infty} X_n = \eta \}=1- \Pr \{ \lim_{n \to \infty} X_n \ne \eta \}=1- \Pr \{ \lim_{n \to \infty} X_n= \theta \} \geq 1-\lim_{n \to \infty}\begin{matrix} \frac{1}{n} \end{matrix}=1$

Thereby satisfying our definition of almost-sure convergence.

#### Convergence in Distribution

Convergence in distribution will appear very frequently in our econometric models through the use of the Central Limit Theorem. So let's define this type of convergence.

##### Definition

A sequence of random variables $\{ X_n ; n=1,2, \cdots \}$ asymptotically converges in distribution to the random variable $X$ if $F_{X_n}(\zeta ) \rightarrow F_{X}(\zeta )$ for all continuity points. $F_{X_n}(\zeta )$ and $F_{X_{}}(\zeta )$ are the cumulative density functions of $X_n$ and $X$ respectively.

It is the distribution of the random variable that we are concerned with here. Think of a students-T distribution: as the degrees of freedom, $n$, increases our distribution becomes closer and closer to that of a gaussian distribution. Therefore the random variable $Y_n \sim t(n)$ converges in distribution to the random variable $Y \sim N(0,1)$ (n.b. we say that the random variable $Y_n \begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} Y$ as a notational crutch, what we really should use is $f_{Y_n} (\zeta )\begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} f_Y(\zeta )$/

##### Example

Let's consider the distribution Xn whose sample space consists of two points, 1/n and 1, with equal probability (1/2). Let X be the binomial distribution with p = 1/2. Then Xn converges in distribution to X.

The proof is simple: we ignore 0 and 1 (where the distribution of X is discontinuous) and prove that, for all other points a, $\lim F_{X_n}(a) = F_X(a)\,$. Since for a < 0 all Fs are 0, and for a > 1 all Fs are 1, it remains to prove the convergence for 0 < a < 1. But $F_{X_n}(a) = \frac{1}{2} ([a \ge \frac{1}{n}] + [a \ge 1])$ (using Iverson brackets), so for any a chose N > 1/a, and for n > N we have:

$n > 1/a \rightarrow a > 1/n \rightarrow [a \ge \frac{1}{n}] = 1 \land [a \ge 1] = 0 \rightarrow F_{X_n}(a) = \frac{1}{2}\,$

So the sequence $F_{X_n}(a)\,$ converges to $F_X(a)\,$ for all points where FX is continuous.

#### Convergence in R-mean Square

Convergence in R-mean square is not going to be used in this book, however for completeness the definition is provided below.

##### Definition

A sequence of random variables $\{ X_n ; n=1,2, \cdots \}$ asymptotically converges in r-th mean (or in the $L^r$ norm) to the random variable $X$ if, for any real number $r>0$ and provided that $E(|X_n|^r) < \infty$ for all n and $r\geq 1$,

$\lim_{n\to \infty }E\left( \left\vert X_n-X\right\vert ^r\right) =0.$

#### Cramer-Wold Device

The Cramer-Wold device will allow us to extend our convergence techniques for random variables from scalars to vectors.

##### Definition

A random vector $\mathbf{X}_n \begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} \mathbf{X} \; \iff \; {\mathbf{\lambda}}^{\operatorname{T}}\mathbf{X}_n \begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} {\mathbf{\lambda}}^{\operatorname{T}}\mathbf{X} \quad \forall \lVert \mathbf{\lambda} \rVert \ne 0$.

### Central Limit Theorem

Let $\ X_1, X_2, X_3, ...$ be a sequence of random variables which are defined on the same probability space, share the same probability distribution D and are independent. Assume that both the expected value μ and the standard deviation σ of D exist and are finite.

Consider the sum $\ S_n = X_1 + ... + X_n$. Then the expected value of $\ S_n$ is nμ and its standard error is σ n1/2. Furthermore, informally speaking, the distribution of Sn approaches the normal distribution N(nμ,σ2n) as n approaches ∞.