Econometric Theory/Asymptotic Convergence

From Wikibooks, open books for an open world
< Econometric Theory
Jump to: navigation, search

Asymptotic Convergence[edit]

Modes of Convergence[edit]

Convergence in Probability[edit]

Convergence in probability is going to be a very useful tool for deriving asymptotic distributions later on in this book. Alongside convergence in distribution it will be the most commonly seen mode of convergence.


A sequence of random variables \{ X_n ; n=1,2, \cdots \} converges in probability to X_{ } if:

\forall \epsilon, \delta >0,
 \exists N \; \operatorname{s.t.} \; \forall n \geq N,
 \Pr \{ |X_n - X| > \delta \}< \epsilon

an equivalent statement is:

\forall \delta >0,
 \lim_{n \to \infty} \Pr \{ |X_n - X| > \delta \}=0

This will be written as either X_n \begin{matrix} \begin{matrix} { }_p \\ \longrightarrow \\{ } \end{matrix} \end{matrix} X or \operatorname{plim} X_n = X.


X_n = \begin{cases} \eta & 1- \begin{matrix} \frac{1}{n} \end{matrix} \\ \theta & \begin{matrix} \frac{1}{n} \end{matrix} \end{cases}

We'll make an intelligent guess that this series converges in probability to the degenerate random variable \eta. So we have that:

\forall \delta >0,\; \Pr \{ |X_n - \eta| > \delta \} \leq \Pr \{ |X_n - \eta| > 0 \}= \Pr \{ X_n= \theta \}= \begin{matrix} \frac{1}{n} \end{matrix}

Therefore our definition for convergence in probability in this case is:

\forall \epsilon , \delta >0,
\exists N \quad \operatorname{s.t.} \forall n \geq N,
\Pr \{ |X_n - \eta | > \delta \} \leq \Pr \{ |X_n - \eta | > 0 \}=\Pr \{ X_n= \theta \}= \begin{matrix} \frac{1}{n} \end{matrix} < \epsilon

So for any positive values of \epsilon \in \mathbb{R} we can always find an N \in \mathbb{N} large enough so that our definition is satisfied. Therefore we have proved that X_n \begin{matrix} { }_p \\ \longrightarrow \\{ } \end{matrix} \eta.

Convergence Almost Sure[edit]

Almost-sure convergence has a marked similarity to convergence in probability, however the conditions for this mode of convergence are stronger; as we will see later, convergence almost surely actually implies that the sequence also converges in probability.


A sequence of random variables \{ X_n ; n=1,2, \cdots \} converges almost surely to the random variable X if:

\forall \delta >0,
 \lim_{n \to \infty} \Pr \{ \bigcup_{m \geq n} |X_m - X| > \delta, \}=0


\Pr \{ \lim_{n \to \infty} X_n = X \}=1

Under these conditions we use the notation X_n \begin{matrix} \begin{matrix} { }_{a.s.} \\ \longrightarrow \\{ } \end{matrix} \end{matrix} X or \lim_{n \to \infty} X_n = X \operatorname{a.s.}.


Let's see if our example from the convergence in probability section also converges almost surely. Defining:

X_n = \begin{cases} \eta & 1- \begin{matrix} \frac{1}{n} \end{matrix} \\ \theta & \begin{matrix} \frac{1}{n} \end{matrix} \end{cases}

we again guess that the convergence is to \eta. Inspecting the resulting expression we see that:

\Pr \{ \lim_{n \to \infty} X_n = \eta \}=1- \Pr \{ \lim_{n \to \infty} X_n \ne \eta \}=1- \Pr \{ \lim_{n \to \infty} X_n= \theta \} \geq 1-\lim_{n \to \infty}\begin{matrix} \frac{1}{n} \end{matrix}=1

Thereby satisfying our definition of almost-sure convergence.

Convergence in Distribution[edit]

Convergence in distribution will appear very frequently in our econometric models through the use of the Central Limit Theorem. So let's define this type of convergence.


A sequence of random variables \{ X_n ; n=1,2, \cdots \} asymptotically converges in distribution to the random variable X if F_{X_n}(\zeta ) \rightarrow F_{X}(\zeta ) for all continuity points. F_{X_n}(\zeta ) and F_{X_{}}(\zeta ) are the cumulative density functions of X_n and X respectively.

It is the distribution of the random variable that we are concerned with here. Think of a students-T distribution: as the degrees of freedom, n, increases our distribution becomes closer and closer to that of a gaussian distribution. Therefore the random variable Y_n \sim t(n) converges in distribution to the random variable Y \sim N(0,1) (n.b. we say that the random variable Y_n  \begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} Y as a notational crutch, what we really should use is f_{Y_n} (\zeta )\begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} f_Y(\zeta )/


Let's consider the distribution Xn whose sample space consists of two points, 1/n and 1, with equal probability (1/2). Let X be the binomial distribution with p = 1/2. Then Xn converges in distribution to X.

The proof is simple: we ignore 0 and 1 (where the distribution of X is discontinuous) and prove that, for all other points a, \lim F_{X_n}(a) = F_X(a)\,. Since for a < 0 all Fs are 0, and for a > 1 all Fs are 1, it remains to prove the convergence for 0 < a < 1. But F_{X_n}(a) = \frac{1}{2} ([a \ge \frac{1}{n}] + [a \ge 1]) (using Iverson brackets), so for any a chose N > 1/a, and for n > N we have:

n > 1/a \rightarrow a > 1/n \rightarrow [a \ge \frac{1}{n}] = 1 \land [a \ge 1] = 0 \rightarrow F_{X_n}(a) = \frac{1}{2}\,

So the sequence F_{X_n}(a)\, converges to F_X(a)\, for all points where FX is continuous.

Convergence in R-mean Square[edit]

Convergence in R-mean square is not going to be used in this book, however for completeness the definition is provided below.


A sequence of random variables \{ X_n ; n=1,2, \cdots \} asymptotically converges in r-th mean (or in the L^r norm) to the random variable X if, for any real number r>0 and provided that E(|X_n|^r) < \infty for all n and r\geq 1,

\lim_{n\to \infty }E\left( \left\vert X_n-X\right\vert ^r\right) =0.

Cramer-Wold Device[edit]

The Cramer-Wold device will allow us to extend our convergence techniques for random variables from scalars to vectors.


A random vector \mathbf{X}_n \begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} \mathbf{X} \; \iff \; {\mathbf{\lambda}}^{\operatorname{T}}\mathbf{X}_n \begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} {\mathbf{\lambda}}^{\operatorname{T}}\mathbf{X} \quad \forall \lVert \mathbf{\lambda} \rVert \ne 0.

Relationships betweeen Modes of Convergence[edit]

Law of Large Numbers[edit]

Central Limit Theorem[edit]

Let \ X_1, X_2, X_3, ... be a sequence of random variables which are defined on the same probability space, share the same probability distribution D and are independent. Assume that both the expected value μ and the standard deviation σ of D exist and are finite.

Consider the sum \ S_n = X_1 + ... + X_n . Then the expected value of \ S_n is nμ and its standard error is σ n1/2. Furthermore, informally speaking, the distribution of Sn approaches the normal distribution N(nμ,σ2n) as n approaches ∞.

Continuous Mapping Theorem[edit]

Slutsky's Theorem[edit]