# Econometric Theory/Asymptotic Convergence

## Asymptotic Convergence

### Modes of Convergence

#### Convergence in Probability

Convergence in probability is going to be a very useful tool for deriving asymptotic distributions later on in this book. Alongside convergence in distribution it will be the most commonly seen mode of convergence.

##### Definition

A sequence of random variables ${\displaystyle \{X_{n};n=1,2,\cdots \}}$ converges in probability to ${\displaystyle X_{}}$ if:

 ${\displaystyle \forall \epsilon ,\delta >0,}$ ${\displaystyle \exists N\;\operatorname {s.t.} \;\forall n\geq N,}$ ${\displaystyle \Pr\{|X_{n}-X|>\delta \}<\epsilon }$

an equivalent statement is:

 ${\displaystyle \forall \delta >0,}$ ${\displaystyle \lim _{n\to \infty }\Pr\{|X_{n}-X|>\delta \}=0}$

This will be written as either ${\displaystyle X_{n}{\begin{matrix}{\begin{matrix}{}_{p}\\\longrightarrow \\{}\end{matrix}}\end{matrix}}X}$ or ${\displaystyle \operatorname {plim} X_{n}=X}$.

##### Example

${\displaystyle X_{n}={\begin{cases}\eta &1-{\begin{matrix}{\frac {1}{n}}\end{matrix}}\\\theta &{\begin{matrix}{\frac {1}{n}}\end{matrix}}\end{cases}}}$

We'll make an intelligent guess that this series converges in probability to the degenerate random variable ${\displaystyle \eta }$. So we have that:

${\displaystyle \forall \delta >0,\;\Pr\{|X_{n}-\eta |>\delta \}\leq \Pr\{|X_{n}-\eta |>0\}=\Pr\{X_{n}=\theta \}={\begin{matrix}{\frac {1}{n}}\end{matrix}}}$

Therefore our definition for convergence in probability in this case is:

 ${\displaystyle \forall \epsilon ,\delta >0,}$ ${\displaystyle \exists N\quad \operatorname {s.t.} \forall n\geq N,}$ ${\displaystyle \Pr\{|X_{n}-\eta |>\delta \}\leq \Pr\{|X_{n}-\eta |>0\}=\Pr\{X_{n}=\theta \}={\begin{matrix}{\frac {1}{n}}\end{matrix}}<\epsilon }$

So for any positive values of ${\displaystyle \epsilon \in \mathbb {R} }$ we can always find an ${\displaystyle N\in \mathbb {N} }$ large enough so that our definition is satisfied. Therefore we have proved that ${\displaystyle X_{n}{\begin{matrix}{}_{p}\\\longrightarrow \\{}\end{matrix}}\eta }$.

#### Convergence Almost Sure

Almost-sure convergence has a marked similarity to convergence in probability, however the conditions for this mode of convergence are stronger; as we will see later, convergence almost surely actually implies that the sequence also converges in probability.

##### Definition

A sequence of random variables ${\displaystyle \{X_{n};n=1,2,\cdots \}}$ converges almost surely to the random variable ${\displaystyle X}$ if:

 ${\displaystyle \forall \delta >0,}$ ${\displaystyle \lim _{n\to \infty }\Pr\{\bigcup _{m\geq n}|X_{m}-X|>\delta ,\}=0}$

equivalently

 ${\displaystyle \Pr\{\lim _{n\to \infty }X_{n}=X\}=1}$

Under these conditions we use the notation ${\displaystyle X_{n}{\begin{matrix}{\begin{matrix}{}_{a.s.}\\\longrightarrow \\{}\end{matrix}}\end{matrix}}X}$ or ${\displaystyle \lim _{n\to \infty }X_{n}=X\operatorname {a.s.} }$.

##### Example

Let's see if our example from the convergence in probability section also converges almost surely. Defining:

${\displaystyle X_{n}={\begin{cases}\eta &1-{\begin{matrix}{\frac {1}{n}}\end{matrix}}\\\theta &{\begin{matrix}{\frac {1}{n}}\end{matrix}}\end{cases}}}$

we again guess that the convergence is to ${\displaystyle \eta }$. Inspecting the resulting expression we see that:

 ${\displaystyle \Pr\{\lim _{n\to \infty }X_{n}=\eta \}=1-\Pr\{\lim _{n\to \infty }X_{n}\neq \eta \}=1-\Pr\{\lim _{n\to \infty }X_{n}=\theta \}\geq 1-\lim _{n\to \infty }{\begin{matrix}{\frac {1}{n}}\end{matrix}}=1}$

Thereby satisfying our definition of almost-sure convergence.

#### Convergence in Distribution

Convergence in distribution will appear very frequently in our econometric models through the use of the Central Limit Theorem. So let's define this type of convergence.

##### Definition

A sequence of random variables ${\displaystyle \{X_{n};n=1,2,\cdots \}}$ asymptotically converges in distribution to the random variable ${\displaystyle X}$ if ${\displaystyle F_{X_{n}}(\zeta )\rightarrow F_{X}(\zeta )}$ for all continuity points. ${\displaystyle F_{X_{n}}(\zeta )}$ and ${\displaystyle F_{X_{}}(\zeta )}$ are the cumulative density functions of ${\displaystyle X_{n}}$ and ${\displaystyle X}$ respectively.

It is the distribution of the random variable that we are concerned with here. Think of a students-T distribution: as the degrees of freedom, ${\displaystyle n}$, increases our distribution becomes closer and closer to that of a gaussian distribution. Therefore the random variable ${\displaystyle Y_{n}\sim t(n)}$ converges in distribution to the random variable ${\displaystyle Y\sim N(0,1)}$ (n.b. we say that the random variable ${\displaystyle Y_{n}{\begin{matrix}{}_{d}\\\longrightarrow \\{}\end{matrix}}Y}$ as a notational crutch, what we really should use is ${\displaystyle f_{Y_{n}}(\zeta ){\begin{matrix}{}_{d}\\\longrightarrow \\{}\end{matrix}}f_{Y}(\zeta )}$/

##### Example

Let's consider the distribution Xn whose sample space consists of two points, 1/n and 1, with equal probability (1/2). Let X be the binomial distribution with p = 1/2. Then Xn converges in distribution to X.

The proof is simple: we ignore 0 and 1 (where the distribution of X is discontinuous) and prove that, for all other points a, ${\displaystyle \lim F_{X_{n}}(a)=F_{X}(a)\,}$. Since for a < 0 all Fs are 0, and for a > 1 all Fs are 1, it remains to prove the convergence for 0 < a < 1. But ${\displaystyle F_{X_{n}}(a)={\frac {1}{2}}([a\geq {\frac {1}{n}}]+[a\geq 1])}$ (using Iverson brackets), so for any a chose N > 1/a, and for n > N we have:

${\displaystyle n>1/a\rightarrow a>1/n\rightarrow [a\geq {\frac {1}{n}}]=1\land [a\geq 1]=0\rightarrow F_{X_{n}}(a)={\frac {1}{2}}\,}$

So the sequence ${\displaystyle F_{X_{n}}(a)\,}$ converges to ${\displaystyle F_{X}(a)\,}$ for all points where FX is continuous.

#### Convergence in R-mean Square

Convergence in R-mean square is not going to be used in this book, however for completeness the definition is provided below.

##### Definition

A sequence of random variables ${\displaystyle \{X_{n};n=1,2,\cdots \}}$ asymptotically converges in r-th mean (or in the ${\displaystyle L^{r}}$ norm) to the random variable ${\displaystyle X}$ if, for any real number ${\displaystyle r>0}$ and provided that ${\displaystyle E(|X_{n}|^{r})<\infty }$ for all n and ${\displaystyle r\geq 1}$,

${\displaystyle \lim _{n\to \infty }E\left(\left\vert X_{n}-X\right\vert ^{r}\right)=0.}$

#### Cramer-Wold Device

The Cramer-Wold device will allow us to extend our convergence techniques for random variables from scalars to vectors.

##### Definition

A random vector ${\displaystyle \mathbf {X} _{n}{\begin{matrix}{}_{d}\\\longrightarrow \\{}\end{matrix}}\mathbf {X} \;\iff \;{\mathbf {\lambda } }^{\operatorname {T} }\mathbf {X} _{n}{\begin{matrix}{}_{d}\\\longrightarrow \\{}\end{matrix}}{\mathbf {\lambda } }^{\operatorname {T} }\mathbf {X} \quad \forall \lVert \mathbf {\lambda } \rVert \neq 0}$.

### Central Limit Theorem

Let ${\displaystyle \ X_{1},X_{2},X_{3},...}$ be a sequence of random variables which are defined on the same probability space, share the same probability distribution D and are independent. Assume that both the expected value μ and the standard deviation σ of D exist and are finite.

Consider the sum ${\displaystyle \ S_{n}=X_{1}+...+X_{n}}$. Then the expected value of ${\displaystyle \ S_{n}}$ is nμ and its standard error is σ n1/2. Furthermore, informally speaking, the distribution of Sn approaches the normal distribution N(nμ,σ2n) as n approaches ∞.