# Statistics/Interval Estimation

## Introduction

Previously, we have discussed point estimation, which gives us an estimator ${\displaystyle {\hat {\theta }}}$ for the value of an unknown parameter ${\displaystyle \theta }$. Now, suppose we want to know the size of error of the point estimator ${\displaystyle {\hat {\theta }}}$, i.e. the difference between ${\displaystyle {\hat {\theta }}}$ and the unknown parameter ${\displaystyle \theta }$. Of course, we can make use of the value of the mean squared error of ${\displaystyle {\hat {\theta }}}$, ${\displaystyle \mathbb {E} [({\hat {\theta }}-\theta )^{2}]}$, or other things.

However, what if we only know about one specific point estimates? We cannot calculate the mean squared error of its corresponding point estimator with just this point estimates, right? So, how do we know the possible size of error of this point estimates? Indeed, it is impossible to tell, since we are only given a particular estimated value of parameter ${\displaystyle \theta }$, but of course we do know the value of the unknown parameter ${\displaystyle \theta }$, thus the difference between this point estimate and ${\displaystyle \theta }$ is also unknown.

To illustrate this, consider the following example: suppose we take a random sample of 10 students from one particular course in university to estimate the mean score of the students in the final exam in that course, denoted by ${\displaystyle \mu }$, (assume the score is normally distributed), and the observed value of the sample mean is ${\displaystyle {\overline {x}}=60}$. Then, what is the difference between this point estimate and the true unknown parameter ${\displaystyle \mu }$? Can we be "confident" that this sample mean is close to ${\displaystyle \mu }$, say ${\displaystyle \mu \in [{\overline {x}}-5,{\overline {x}}+5]=[55,65]}$?

It is possible that ${\displaystyle \mu }$ is, say 90, and somehow the students in the sample are the one with very poor performance. On the other hand, it is also possible that ${\displaystyle \mu }$ is, say 30, and somehow the students in the sample are the one who perform well (relatively). Of course, it is also possible that the ${\displaystyle \mu }$ is quite close to 60, say 59. From this example, we can see that a particular value ${\displaystyle {\overline {x}}=60}$ does not tell us the possible size of error: the error can be very large, and also can be very small.

In this chapter, we will introduce interval estimation where we use interval estimator that can describe the size of error through providing the probability for the random interval (i.e. interval with at least one of its bounds to be a random variable) given by the interval estimator to contain the unknown parameter ${\displaystyle \theta }$, which measures the "accuracy" of the interval estimator of ${\displaystyle \theta }$, and hence the size of error.

As suggested by the name interval estimator, the estimator involves some sort of intervals. Also, as one may expect, interval estimation is also based on statistics:

Definition. (Interval estimation) Interval estimation is a process of using the value of a statistic to estimate an interval of plausible values of an unknown parameter.

Of course, we would like the probability for the unknown parameter ${\displaystyle \theta }$ to lie in the interval to be close to 1, so that the interval estimator is very accurate. However, a very accurate interval estimator may have a very bad "precision", i.e. the interval covers "too many" plausible values of an unknown parameter, and therefore even if we know that ${\displaystyle \theta }$ is very likely to be one of such values, there are too many different possibilities. Hence, such interval estimator is not very "useful". To illustrate this, suppose the interval concerned is ${\displaystyle \mathbb {R} }$, which is the parameter space of ${\displaystyle \theta }$. Then, of course ${\displaystyle \mathbb {P} (\theta \in \mathbb {R} )=1}$ (so the "confidence" is high) since ${\displaystyle \theta }$ must lie in its parameter space. However, such interval has basically "zero precision", and is quite "useless", since the "plausible values" of ${\displaystyle \theta }$ in the intervals are essentially all possible values of ${\displaystyle \theta }$.

From this, we can observe the need of the "precision" of the interval, that is, we also want the width of the interval to be small, so that we can have some ideas about the "location" of ${\displaystyle \theta }$. However, as the interval becomes smaller, it is more likely that such interval misses ${\displaystyle \theta }$, i.e. does not cover the actual value of ${\displaystyle \theta }$, and therefore the probability for ${\displaystyle \theta }$ to lie in that interval becomes smaller, i.e. the interval becomes less "accurate". To illustrate this, let us consider the extreme case: the interval is so small that it becomes an interval containing a single point (the two end-points of the interval coincide). Then, the "interval estimator" basically becomes a "point estimator" in some sense, and we know that it is very unlikely that the true value of ${\displaystyle \theta }$ equals the value of the point estimator ${\displaystyle {\hat {\theta }}}$ (${\displaystyle \theta }$ lies in that "interval" is equivalent to ${\displaystyle \theta ={\hat {\theta }}}$ in this case). Indeed, if the distribution of ${\displaystyle {\hat {\theta }}}$ is continuous, then ${\displaystyle \mathbb {P} ({\hat {\theta }}=\theta )=0}$.

As we can see from above, although we want the interval to have a very high "confidence" and also "very precise" (i.e. the interval is very narrow), we cannot have both of them, since an increase in confidence causes a decrease in "precision", and an increase in "precision" causes a decrease in confidence. Therefore, we need to make some compromises between them, and pick an interval that gives a sufficiently high confidence, and also is quite precise. In other words, we would like to have a narrow interval that will cover ${\displaystyle \theta }$ with a large probability.

## Terminologies

Now, let us formally define some terminologies related to interval estimation.

Definition. (Interval estimator) Let ${\displaystyle X_{1},\dotsc ,X_{n}}$ be a random sample. An interval estimator of an unknown parameter ${\displaystyle \theta }$ is a random interval ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} )]}$ where ${\displaystyle L=L(X_{1},\dotsc ,X_{n})}$ and ${\displaystyle U=U(X_{1},\dotsc ,X_{n})}$ are two statistics such that ${\displaystyle L(\mathbf {X} )\leq U(\mathbf {X} )}$ always.

Remark.

• We call the interval ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} )]}$ as random interval since both endpoints ${\displaystyle L(\mathbf {X} )}$ and ${\displaystyle U(\mathbf {X} )}$ are random variables.
• The interval involved may also be an open interval (${\displaystyle (L(\mathbf {X} ),U(\mathbf {X} ))}$), a half-open and half-closed interval (${\displaystyle (L(\mathbf {X} ),U(\mathbf {X} )]}$ or ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} ))}$), or an one-sided interval (${\displaystyle (-\infty ,U]}$ or ${\displaystyle [L,\infty )}$) (we may take ${\displaystyle L(\mathbf {X} )=-\infty }$ and ${\displaystyle U(\mathbf {X} )=\infty }$ (in extended real number sense).
• When we observe that ${\displaystyle X_{1}=x_{1},\dotsc ,X_{n}=x_{n}}$, we call ${\displaystyle [L(x_{1},\dotsc ,x_{n}),U(x_{1},\dotsc ,x_{n})]}$ the interval estimate of ${\displaystyle \theta }$, denoted by ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ (${\displaystyle L(\mathbf {x} )}$ and ${\displaystyle U(\mathbf {x} )}$ are no longer random).

Definition. (Coverage probability) The coverage probability of an interval estimator ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} )]}$ is ${\displaystyle \mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])}$.

Example. Let ${\displaystyle X_{1},X_{2},X_{3},X_{4}}$ be a random sample from the normal distribution ${\displaystyle {\mathcal {N}}(\mu ,1)}$. Consider an interval estimator of ${\displaystyle \mu }$: ${\displaystyle [{\overline {X}}-1,{\overline {X}}+1]}$.

(a) Calculate the probability ${\displaystyle \mathbb {P} ({\overline {X}}=\mu )}$.

(b) Calculate the coverage probability ${\displaystyle \mathbb {P} (\mu \in [{\overline {X}}-1,{\overline {X}}+1])}$.

Solution:

(a) Since the distribution of ${\displaystyle {\overline {X}}}$ is continuous, ${\displaystyle \mathbb {P} ({\overline {X}}=\mu )=0}$.

(b) The coverage probability {\displaystyle {\begin{aligned}\mathbb {P} (\mu \in [{\overline {X}}-1,{\overline {X}}+1])&=\mathbb {P} ({\overline {X}}-1\leq \mu \leq {\overline {X}}+1)\\&=\mathbb {P} (-1\leq \mu -{\overline {X}}\leq 1)\\&=\mathbb {P} (1\geq {\overline {X}}-\mu \geq -1)\\&=\mathbb {P} \left({\frac {-1}{\sqrt {1/4}}}\leq {\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\leq {\frac {1}{\sqrt {1/4}}}\right)\\&=\mathbb {P} \left(-2\leq Z\leq 2\right)&\left(Z={\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\sim {\mathcal {N}}(0,1),{\text{ by property of normal distribution}}\right)\\&\approx 0.97725-0.02275&({\text{standard normal table}})\\&=0.9545.\end{aligned}}}

Exercise.

(a) Guess that whether the coverage probability ${\displaystyle \mathbb {P} (\mu \in [{\overline {X}}-2,{\overline {X}}+2])}$ is greater than ${\displaystyle \mathbb {P} (\mu \in [{\overline {X}}-1,{\overline {X}}+1])\approx 0.9545}$.

(b) Calculate ${\displaystyle \mathbb {P} (\mu \in [{\overline {X}}-2,{\overline {X}}+2])}$ to see whether your guess in (a) is correct or not.

(c) (construction of interval estimator) Find ${\displaystyle k}$ such that ${\displaystyle \mathbb {P} (\mu \in [{\overline {X}}-k,{\overline {X}}+k])\approx 0.9973}$ (Hint: ${\displaystyle \mathbb {P} (-3\leq Z\leq 3)\approx 0.9973}$ where ${\displaystyle Z\sim {\mathcal {N}}(0,1)}$).

(d) Suppose it is observed that ${\displaystyle X_{1}=1,X_{2}=3,X_{3}=2.5,X_{4}=1.5}$. Find the interval estimate of the given interval estimator ${\displaystyle [{\overline {X}}-1,{\overline {X}}+1]}$.

(e) Suppose the actual parameter ${\displaystyle \mu }$ is 1.2. Does ${\displaystyle \mu }$ lie in the interval estimate in (d)?

Solution

(a) Intuitively, one should guess that this is true.

(b) {\displaystyle {\begin{aligned}\mathbb {P} (\mu \in [{\overline {X}}-{\color {blue}2},{\overline {X}}+{\color {blue}2}])&=\mathbb {P} ({\overline {X}}-{\color {blue}2}\leq \mu \leq {\overline {X}}+{\color {blue}2})\\&=\mathbb {P} (-{\color {blue}2}\leq \mu -{\overline {X}}\leq {\color {blue}2})\\&=\mathbb {P} ({\color {blue}2}\geq {\overline {X}}-\mu \geq -{\color {blue}2})\\&=\mathbb {P} \left({\frac {-{\color {blue}2}}{\sqrt {1/4}}}\leq {\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\leq {\frac {\color {blue}2}{\sqrt {1/4}}}\right)\\&=\mathbb {P} \left(-{\color {blue}4}\leq Z\leq {\color {blue}4}\right)&\left(Z={\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\sim {\mathcal {N}}(0,1),{\text{ by property of normal distribution}}\right)\\&\approx 0.99997-0.00003&({\text{standard normal table}})\\&=0.99994.\\\end{aligned}}}

(c) Such ${\displaystyle k}$ is ${\displaystyle {\frac {3}{2}}}$.

Proof. {\displaystyle {\begin{aligned}\mathbb {P} (\mu \in [{\overline {X}}-{\color {blue}3/2},{\overline {X}}+{\color {blue}3/2}])&=\mathbb {P} ({\overline {X}}-{\color {blue}3/2}\leq \mu \leq {\overline {X}}+{\color {blue}3/2})\\&=\mathbb {P} (-{\color {blue}3/2}\leq \mu -{\overline {X}}\leq {\color {blue}3/2})\\&=\mathbb {P} ({\color {blue}3/2}\geq {\overline {X}}-\mu \geq -{\color {blue}3/2})\\&=\mathbb {P} \left({\frac {-{\color {blue}3/2}}{\sqrt {1/4}}}\leq {\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\leq {\frac {\color {blue}3/2}{\sqrt {1/4}}}\right)\\&=\mathbb {P} \left(-{\color {blue}3}\leq Z\leq {\color {blue}3}\right)&\left(Z={\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\sim {\mathcal {N}}(0,1),{\text{ by property of normal distribution}}\right)\\&\approx 0.9973.&({\text{hint}})\\\\\end{aligned}}}

${\displaystyle \Box }$

(d) Under this observation, ${\displaystyle {\overline {x}}={\frac {1+3+2.5+1.5}{4}}=2}$. Hence, the interval estimate is ${\displaystyle [1,3]}$.

(e) Since ${\displaystyle 1.2\in [1,3]}$, ${\displaystyle \mu }$ lies in the interval estimate ${\displaystyle [1,3]}$.

Definition. (Confidence coefficient) For an interval estimator ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} )]}$ of ${\displaystyle \theta }$, the confidence coefficient of ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} )]}$, denoted by ${\displaystyle 1-\alpha }$, is the infimum of the (set of) coverage probabilities (over all ${\displaystyle \theta }$ in the parameter space ${\displaystyle \Theta }$), ${\displaystyle {\underset {\theta \in \Theta }{\inf }}\;\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])}$.

Remark.

• Infimum means the greatest lower bound (it is the same as minimum under some conditions). Thus ${\displaystyle {\underset {\theta \in \Theta }{\inf }}\;\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])}$ is the greatest lower bound of the coverage probabilities over all ${\displaystyle \theta \in \Theta }$. Intuitively, this means the confidence coefficient is chosen conservatively: when there is some ${\displaystyle \theta }$ making the coverage probability low, it will decrease the confidence coefficient.
• In simple cases, the value of coverage probability does not depend on the choice of ${\displaystyle \theta }$ (i.e. is a constant function of ${\displaystyle \theta }$) [1]. Hence, the confidence coefficient ${\displaystyle 1-\alpha ={\underset {\theta \in \Theta }{\inf }}\;\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])=\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])}$. Unless otherwise specified, you can assume this is true in the following.
• The reason for choosing the notation to be "${\displaystyle 1-\alpha }$" is related to hypothesis testing, where "${\displaystyle \alpha }$" has some special meanings.
• As we shall see in the next chapter, there is a close relationship between confidence intervals and hypothesis testing, in the sense that one of them can be constructed by using another one.
• Interval estimator with a measure of "confidence" is called confidence interval. In this case, the confidence coefficient is a measure of confidence. Hence, the interval estimator with the confidence coefficient is a confidence interval, or more specifically ${\displaystyle 1-\alpha }$ confidence interval (usually ${\displaystyle 1-\alpha }$ is expressed as a percentage).

Example. (Interpretation of confidence coefficient) Consider an interval estimator of a unknown parameter ${\displaystyle \theta }$: ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} )]}$. Suppose its confidence coefficient is ${\displaystyle 1-\alpha }$.

• Student A's claim: since the confidence coefficient is ${\displaystyle 1-\alpha }$, the coverage probability ${\displaystyle \mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])=1-\alpha }$. It follows that the probability for ${\displaystyle \theta }$ to lie in interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ in an experiment is also ${\displaystyle 1-\alpha }$.
• Student B's claim: from an interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ coming from an experiment, we know that it either contains ${\displaystyle \theta }$ or does not contain ${\displaystyle \theta }$. In the former case, the coverage probability is 1, and in the latter case, the coverage probability is 0. Hence, student A's claim is wrong.
• Student C's claim: when we perform a large number of experiments, we will expect the interval estimate in ${\displaystyle 1-\alpha }$ of them contains ${\displaystyle \theta }$, and the interval estimate in another ${\displaystyle \alpha }$ of them does not contain ${\displaystyle \theta }$.

Comment on each claim.

Solution:

Student B's claim is correct, since in a single experiment, the interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ is already decided (and thus fixed). Also, the unknown parameter ${\displaystyle \theta }$ is fixed (the population distribution is given). This means that whether ${\displaystyle \theta }$ lies or does not lies in the fixed interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ is not a random event. Instead, it is already decided based on the fixed ${\displaystyle \theta }$ and ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$.

For student A's claim, it is wrong since the student B's claim is correct. It may be more natural to understand this why it is wrong if we rephrase the claim a little bit: "the probability for fixed ${\displaystyle \theta }$ to lie in fixed interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ is ${\displaystyle 1-\alpha }$." This is incorrect since the event involved is not even random! To see this more clearly, we can consider what happen if we "hypothetically" repeat this particular experiment with fixed ${\displaystyle \theta }$ and fixed interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ many times. We can see that the "outcome" in every experiment is the same, that is either ${\displaystyle \theta }$ lies in ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$, or does not lie in ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ in all experiments. Then, it follows by the definition of frequentist probability that the probability is either 1 (former case) or 0 (latter case).

We may modify student A's claim to make it correct: the probability for ${\displaystyle \theta }$ to lie in an interval estimator ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} )]}$ is ${\displaystyle 1-\alpha }$. This can be interpreted as: the probability for ${\displaystyle \theta }$ to lie in an interval estimate calculated from a future and not yet realized sample (NOT a realized sample, which is a past sample) is ${\displaystyle 1-\alpha }$.

Student C's claim is also correct, since we can interpret the probability from frequentist point of view, i.e. consider the probability as the "long-run" proportion for the interval estimates (for each trial, an interval estimate is observed from the interval estimator ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} )]}$) that contains the true parameter ${\displaystyle \theta }$.

Remark.

• We may say that we "feel ${\displaystyle (1-\alpha )100\%}$ confident" that ${\displaystyle \theta }$ lies in an interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$, corresponding to a ${\displaystyle 1-\alpha }$ confidence interval, from an experiment.
• To understand this, we may refer to the student C's claim above. When we think about how "confident" are we about the statement that ${\displaystyle \theta }$ lies in ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$, we may consider this:
• we "hypothetically" repeat the generation of interval estimates many times, and we will expect that ${\displaystyle 1-\alpha }$ of them contain ${\displaystyle \theta }$.
• Then, it is natural to "feel" ${\displaystyle (1-\alpha )100\%}$ confident that the interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ contains ${\displaystyle \theta }$ based on these hypothetical experiments.
• Alternatively, as suggested above, it is correct to say that the probability for ${\displaystyle \theta }$ to lie in an interval estimate calculated from a future and not yet realized sample is ${\displaystyle 1-\alpha }$.
• Hence, the probability ${\displaystyle 1-\alpha }$ measures the "reliability" of estimation procedure and method (the higher the probability, the higher the reliability).
• Therefore, it is natural to feel ${\displaystyle (1-\alpha )100\%}$ confident that the interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ contains ${\displaystyle \theta }$ based on the above reliability.
• We may regard "we feel ${\displaystyle (1-\alpha )100\%}$ confident that ${\displaystyle \theta }$ lies in the interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$" to be an intuitive and alternative expression of "the interval estimate ${\displaystyle [L(\mathbf {x} ),U(\mathbf {x} )]}$ is a ${\displaystyle 1-\alpha }$ confidence interval".

Example. Continue from the previous example about normal distribution ${\displaystyle {\mathcal {N}}(\mu ,1)}$. The confidence coefficient of interval estimator of ${\displaystyle \mu }$, ${\displaystyle [{\overline {X}}-1,{\overline {X}}+1]}$, is 0.9545, or approximately 95%. Hence, such interval may be called 95% confidence interval.

Exercise. Consider a continuous distribution with an unknown real-valued parameter ${\displaystyle \theta }$, and a random sample ${\displaystyle X_{1},\dotsc ,X_{n}}$ drawn from it. Suppose ${\displaystyle \mathbb {P} (\theta \leq T_{1})=0.025}$ and ${\displaystyle \mathbb {P} (\theta \geq T_{2})=0.025}$ where ${\displaystyle T_{1}}$ and ${\displaystyle T_{2}}$ are statistics of ${\displaystyle X}$ such that ${\displaystyle T_{2}\geq T_{1}}$ always (Can ${\displaystyle T_{2}=T_{1}}$? [2]) (${\displaystyle \mathbb {R} }$ is the parameter space of ${\displaystyle \theta }$).

1 Which of the following is/are a 90% confidence interval?

 ${\displaystyle [T_{1},T_{2}]}$ ${\displaystyle (T_{1},T_{2})}$ ${\displaystyle (T_{1},\infty )}$ ${\displaystyle (-\infty ,T_{2})}$ None of the above.

2 Which of the following is/are a 95% confidence interval?

 ${\displaystyle [T_{1},T_{2}]}$ ${\displaystyle (T_{1},T_{2})}$ ${\displaystyle (T_{1},\infty )}$ ${\displaystyle (-\infty ,T_{2})}$ None of the above.

3 Which of the following is/are a 97.5% confidence interval?

 ${\displaystyle [T_{1},T_{2}]}$ ${\displaystyle (T_{1},T_{2})}$ ${\displaystyle (T_{1},\infty )}$ ${\displaystyle (-\infty ,T_{2})}$ None of the above.

4. Can you suggest a (i) 0% confidence interval; (ii) 100% confidence interval?

Solution

(i) Since the distribution is continuous, one may take ${\displaystyle [1,1]}$, for example, as the 0% confidence interval since ${\displaystyle \mathbb {P} (\theta \in [1,1])=\mathbb {P} (\theta =1)=0}$.

(ii) One may take ${\displaystyle (-\infty ,\infty )}$ (i.e. ${\displaystyle \mathbb {R} }$), which is the parameter space of ${\displaystyle \theta }$ as the 100% confidence interval. This is because ${\displaystyle \mathbb {P} (\theta \in (-\infty ,\infty ))=1}$. (In general, a 100% confidence interval for an unknown parameter is the parameter space of that unknown parameter.)

## Construction of confidence intervals

After understanding what confidence interval is, we would like to know how to construct one naturally. A main way for such construction is using the pivotal quantity, which is defined below.

Definition. (Pivotal quantity) A random variable ${\displaystyle Q(\mathbf {X} ,\theta )=Q(X_{1},\dotsc ,X_{n},\theta )}$ is a pivotal quantity (of ${\displaystyle \theta }$) (which is function of the random sample ${\displaystyle X_{1},\dotsc ,X_{n}}$ and the unknown parameter (vector) ${\displaystyle \theta }$) if the distribution of ${\displaystyle Q(\mathbf {X} ,\theta )}$ is independent from the parameter (vector) ${\displaystyle \theta }$, that is, the distribution is the same for each value of ${\displaystyle \theta }$.

Remark.

• A pivotal quantity may not be a statistic, since statistic is only a function of random sample ${\displaystyle X_{1},\dotsc ,X_{n}}$ (but not the unknown parameter(s)), while pivotal quantity is a function of the random sample and the unknown parameter (vector) ${\displaystyle \theta }$.
• If the expression of a pivotal quantity does not involve ${\displaystyle \theta }$, such pivotal quantity is a statistic, and is called ancillary statistic.
• Here, we focus on the pivotal quantities with expressions involving ${\displaystyle \theta }$, so that we can use them to construct confidence intervals.

After having such pivotal quantity ${\displaystyle Q(\mathbf {X} ,\theta )}$, we can construct a ${\displaystyle 1-\alpha }$ confidence interval for ${\displaystyle \theta }$ by the following steps:

1. For that value of ${\displaystyle \alpha }$, find ${\displaystyle a,b}$ such that ${\displaystyle \mathbb {P} (a\leq Q(\mathbf {X} ,\theta )\leq b)=1-\alpha }$ [3] (${\displaystyle a,b}$ does not involve ${\displaystyle \theta }$ since ${\displaystyle Q(\mathbf {X} ,\theta )}$ is a pivotal quantity).
2. After that, we can transform ${\displaystyle a\leq Q(\mathbf {X} ,\theta )\leq b}$ to ${\displaystyle L(\mathbf {X} )\leq \theta \leq U(\mathbf {X} )}$ since the expression of ${\displaystyle Q(\mathbf {X} ,\theta )}$ involves ${\displaystyle \theta }$, as we have assumed (the resulting inequalities should be equivalent to the original inequalities, that is, ${\displaystyle a\leq Q(\mathbf {X} ,\theta )\leq b{\color {darkgreen}\iff }L(\mathbf {X} )\leq \theta \leq U(\mathbf {X} )}$, so that ${\displaystyle \mathbb {P} (L(\mathbf {X} )\leq \theta \leq U(\mathbf {X} )){\color {darkgreen}=}\mathbb {P} (a\leq Q(\mathbf {X} ,\theta )\leq b)}$).

Example. Consider a random sample ${\displaystyle X_{1},\dotsc ,X_{n}}$ from normal distribution ${\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2})}$ with unknown mean ${\displaystyle \mu }$ and known variance ${\displaystyle \sigma ^{2}}$. Find a pivotal quantity (of ${\displaystyle \mu }$).

Solution: By the property of normal distribution, ${\displaystyle {\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\sim {\mathcal {N}}(0,1)}$. Since ${\displaystyle {\mathcal {N}}(0,1)}$ is independent of the unknown parameter ${\displaystyle \mu }$, ${\displaystyle {\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}}$ is a pivotal quantity.

Alternatively, ${\displaystyle {\overline {X}}-\mu \sim {\mathcal {N}}(0,n\sigma ^{2})}$ is also a pivotal quantity, since ${\displaystyle {\mathcal {N}}(0,n\sigma ^{2})}$ is independent of ${\displaystyle \mu }$ (both ${\displaystyle n}$ and ${\displaystyle \sigma ^{2}}$ are known, so the variance of this distribution ${\displaystyle n\sigma ^{2}}$ is known).

Exercise.

(a) Is ${\displaystyle {\overline {X}}}$ a pivotal quantity?

(b) Is ${\displaystyle {\frac {X_{1}}{\mu }}}$ a pivotal quantity?

Solution

(a) No, since ${\displaystyle {\overline {X}}\sim {\mathcal {N}}(\mu ,n\sigma ^{2})}$, and this distribution depends on ${\displaystyle \mu }$.

(b) Yes, since ${\displaystyle {\frac {X_{1}}{\mu }}\sim {\mathcal {N}}(1,\sigma ^{2}/\mu ^{2})}$, and this distribution is independent of ${\displaystyle \mu }$.

Exercise. Consider a random sample ${\displaystyle X_{1},\dotsc ,X_{n}}$ from normal distribution ${\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2})}$ with unknown mean ${\displaystyle \mu }$ and variance ${\displaystyle \sigma ^{2}}$. Apart from ${\displaystyle {\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}}$, suggest a pivotal quantity of ${\displaystyle (\mu ,\sigma ^{2})}$.

Solution

A pivotal quantity is ${\displaystyle {\frac {X_{1}-\mu }{\sigma }}}$, since ${\displaystyle {\frac {X_{1}-\mu }{\sigma }}\sim {\mathcal {N}}(0,1)}$, and the distribution is independent from both ${\displaystyle \mu }$ and ${\displaystyle \sigma ^{2}}$.

Example. Consider a random sample ${\displaystyle X_{1},\dotsc ,X_{n}}$ from exponential distribution ${\displaystyle \operatorname {Exp} (\lambda )}$. Find a pivotal quantity. (Hint: ${\displaystyle \sum _{i=1}^{n}X_{i}\sim \operatorname {Gamma} (n,\lambda )}$ and if ${\displaystyle Y\sim \operatorname {Gamma} (\alpha ,\lambda )}$, then ${\displaystyle cY\sim \operatorname {Gamma} (\alpha ,c\lambda )}$.)

Solution: A pivotal quantity is ${\displaystyle {\frac {\sum _{i=1}^{n}X_{i}}{\lambda }}}$, since ${\displaystyle {\frac {\sum _{i=1}^{n}X_{i}}{\lambda }}\sim \operatorname {Gamma} (n,\lambda /\lambda )\equiv \operatorname {Gamma} (\alpha ,1)}$, where the distribution is independent from ${\displaystyle \lambda }$.

Example. (A pivotal quantity for general distributions) Consider a distribution with unknown parameter (vector) ${\displaystyle \theta }$, where its cdf ${\displaystyle F_{X}}$ is bijective (so that ${\displaystyle F_{X}^{-1}}$ exists).

(a) Prove that ${\displaystyle F_{X}(X)\sim {\mathcal {U}}[0,1]}$.

(b) Suppose a random sample ${\displaystyle X_{1},\dotsc ,X_{n}}$ is taken from that distribution. Suggest a pivotal quantity.

Solution:

(a)

Proof. Let ${\displaystyle Y=F_{X}(X)}$, and ${\displaystyle F_{Y}(y)}$ be the cdf of ${\displaystyle Y}$. Then, ${\displaystyle F_{Y}(y)=\mathbb {P} (Y\leq y)=\mathbb {P} (F_{X}(X)\leq y)=\mathbb {P} (X\leq F_{X}^{-1}(y))=F_{X}(F_{X}^{-1}(y))=y}$. Differentiating the cdf gives ${\displaystyle f_{Y}(y)={\frac {d}{dy}}F_{Y}(y)={\frac {d}{dy}}y=1}$. This means that the pdf of ${\displaystyle F_{X}(X)}$ is 1. Also, we know that the support of ${\displaystyle F_{X}(X)}$ is ${\displaystyle [0,1]}$ since ${\displaystyle F_{X}(X)}$ is essentially a probability. Hence, we have ${\displaystyle F_{X}(X)\sim {\mathcal {U}}[0,1]}$.

${\displaystyle \Box }$

(b) From (a), we know that ${\displaystyle F_{X}(X)\sim {\mathcal {U}}[0,1]}$ (the cdf ${\displaystyle F_{X}(X)}$ involves the parameter (vector) ${\displaystyle \theta }$), and this distribution is clearly independent from the parameter (vector) ${\displaystyle \theta }$. Hence, a pivotal quantity is ${\displaystyle F_{X}(X_{1})}$ (or ${\displaystyle F_{X_{1}}(X_{1})}$, which is the same since ${\displaystyle X_{1}}$ is taken from the distribution with cdf ${\displaystyle F_{X}}$).

Exercise. Suppose a single observation ${\displaystyle X_{1}}$ is taken from the exponential distribution ${\displaystyle \operatorname {Exp} (\lambda )}$. Find a pivotal quantity using the above method.

Solution

Since the cdf of ${\displaystyle \operatorname {Exp} (\lambda )}$ is ${\displaystyle F_{X}(x)=1-e^{-\lambda x}}$, as suggested by above, a pivotal quantity is ${\displaystyle 1-e^{-\lambda X_{1}}}$, which follows the uniform distribution ${\displaystyle {\mathcal {U}}[0,1]}$.

## Confidence intervals for means of normal distributions

In the following, we will use the concept of pivotal quantity to construct confidence intervals for means and variances of normal distributions. After that, because of the central limit theorem, we can construct approximated confidence intervals for means and variances of other types of distributions that are not normal.

### Mean of a normal distribution

Before discussing this confidence interval, let us first introduce a notation:

• ${\displaystyle z_{\alpha }}$ is the upper percentile of ${\displaystyle {\mathcal {N}}(0,1)}$ at level ${\displaystyle \alpha }$, i.e. it satisfies ${\displaystyle \mathbb {P} (Z\geq z_{\alpha })=\alpha }$ where ${\displaystyle Z\sim {\mathcal {N}}(0,1)}$.

We can find (or calculate) the values of ${\displaystyle z_{\alpha }}$ for different ${\displaystyle \alpha }$ from standard normal table.

Theorem. (Confidence interval of ${\displaystyle \mu }$ when ${\displaystyle \sigma ^{2}}$ is known) Let ${\displaystyle X_{1},\dotsc ,X_{n}}$ be a random sample from ${\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2})}$. When ${\displaystyle \sigma ^{2}}$ is known, a ${\displaystyle 1-\alpha }$ confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[{\overline {X}}-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}},{\overline {X}}+z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right].}$

Remark.

• By the definition of interval estimate, the corresponding interval estimate of ${\displaystyle \mu }$ is ${\displaystyle \left[{\overline {x}}-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}},{\overline {x}}+z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right]}$, with observed value ${\displaystyle {\overline {X}}={\overline {x}}}$. For simplicity, we usually also call such interval estimate as ${\displaystyle 1-\alpha }$ confidence interval.
• We can know the meaning of ${\displaystyle 1-\alpha }$ confidence interval by referring to the context.
• Usually, when the realization of random sample is given, then ${\displaystyle 1-\alpha }$ confidence interval is referring to the interval estimate (since the interval estimate is more "useful" and "suggestive" in this context).
• Unless otherwise specified, the ${\displaystyle 1-\alpha }$ confidence intervals referred are constructed according to this theorem (if applicable).

Proof. Let ${\displaystyle Z={\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\sim {\mathcal {N}}(0,1)}$. Since ${\displaystyle Z}$ is a pivotal quantity (its distribution is independent from ${\displaystyle \mu }$), we set ${\displaystyle 1-\alpha =1-\mathbb {P} (Z\geq z_{\alpha /2})-\mathbb {P} (Z\leq -z_{\alpha /2})=\mathbb {P} (-z_{\alpha /2} where ${\displaystyle z_{\alpha /2}}$ is a constant (and does not involve ${\displaystyle \mu }$). Then, we have {\displaystyle {\begin{aligned}1-\alpha &=\mathbb {P} (-z_{\alpha /2}\leq Z\leq z_{\alpha /2})\\&=\mathbb {P} \left(-z_{\alpha /2}\leq {\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\leq z_{\alpha /2}\right)\\&=\mathbb {P} \left(-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\leq {\overline {X}}-\mu \leq z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right)\\&=\mathbb {P} \left(z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\geq \mu -{\overline {X}}\geq -z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right)\\&=\mathbb {P} \left(-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\leq \mu -{\overline {X}}\leq z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right)&({\text{rewrite}})\\&=\mathbb {P} \left({\overline {X}}-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\leq \mu \leq {\overline {X}}+z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right).\\\end{aligned}}} The result follows.

${\displaystyle \Box }$

The following graph illustrates ${\displaystyle \mathbb {P} (-z_{\alpha /2}\leq Z\leq z_{\alpha /2})=1-\alpha }$:

                    |
*-|-*
/##|##\
/###|###\  <----- area 1-a
/####|####\
/#####|#####\
/######|######\
/|######|######|\
area    --*.|######|######|.*--
a/2 --> ....|######|######|....  <---  area a/2
------------*---------------
-z_{a/2}       z_{a/2}


Example. Consider a random sample ${\displaystyle X_{1},\dotsc ,X_{5}}$ from ${\displaystyle {\mathcal {N}}(\mu ,1)}$. Suppose it is observed that ${\displaystyle X_{1}=0.5,X_{2}=1,X_{3}=-2,X_{4}=0,X_{5}=0.5}$.

Construct a 95% confidence interval for ${\displaystyle \mu }$.

Solution: Since ${\displaystyle {\overline {x}}={\frac {0.5+1-2+0+0.5}{5}}=0}$, and ${\displaystyle z_{0.025}\approx 1.96}$ (from standard normal table, we know that ${\displaystyle \mathbb {P} (Z\leq 1.96)\approx 1-0.025=0.975}$ where ${\displaystyle Z\sim {\mathcal {N}}(0,1)}$), it follows that a 95% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[0-1.96{\frac {\sqrt {1}}{\sqrt {5}}},0+1.96{\frac {\sqrt {1}}{\sqrt {5}}}\right]\approx [-0.8765,0.8765]}$.

Exercise.

(a) Construct a 99% confidence interval for ${\displaystyle \mu }$.

(b) Construct a 90% confidence interval for ${\displaystyle \mu }$.

(c) (alternative way of constructing confidence interval) Using a similar argument as in the proof of the previous theorem, another ${\displaystyle 1-\alpha }$ confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[{\overline {X}}-z_{\alpha /5}{\frac {\sigma }{\sqrt {n}}},{\overline {X}}+z_{4\alpha /5}{\frac {\sigma }{\sqrt {n}}}\right]}$ since ${\displaystyle 1-\alpha =1-{\frac {4\alpha }{5}}-{\frac {\alpha }{5}}=1-\mathbb {P} (Z\geq z_{4\alpha /5})-\mathbb {P} (Z\leq -z_{\alpha /5})=\mathbb {P} (-z_{\alpha /5}\leq Z\leq z_{4\alpha /5})}$. Construct another 95% confidence interval for ${\displaystyle \mu }$ by this method.

(d) Is the width of the confidence interval (i.e. its upper bound minus its lower bound) constructed in (c) the same as that constructed in the example?

Solution

(a) Since ${\displaystyle z_{0.005}\approx 2.57}$ (from standard normal table), a 99% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[0-2.57{\frac {\sqrt {1}}{\sqrt {5}}},0+2.57{\frac {\sqrt {1}}{\sqrt {5}}}\right]\approx [-1.149,1.149]}$.

(b) Since ${\displaystyle z_{0.05}\approx 1.64}$ (from standard normal table), a 90% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[0-1.64{\frac {\sqrt {1}}{\sqrt {5}}},0+1.64{\frac {\sqrt {1}}{\sqrt {5}}}\right]\approx [-0.733,0.733]}$.

(c) Since ${\displaystyle z_{0.01}\approx 2.33}$ and ${\displaystyle z_{0.04}\approx 1.75}$ from standard normal table, another 95% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[0-2.33{\frac {\sqrt {1}}{\sqrt {5}}},0+1.75{\frac {\sqrt {1}}{\sqrt {5}}}\right]\approx [-1.042,0.783]}$

(d) The width of the confidence interval in the example is 1.753 (approximately), while the width of the confidence interval in (c) is 1.825 (approximately). Hence, their widths are different.

Remark.

• As we can see, when the confidence coefficient is higher, the corresponding confidence interval becomes wider.
• This matches with our previous discussion.

Example. An undergraduate student John wants to estimate the average daily time spent on playing computer games of all teenagers aged 14-16 in the previous week. Clearly, it is infeasible to ask all such teenagers about their time spent. Therefore, John decides to take a random sample of 10 teenagers from the population (all teenagers aged 14-16), and their time spent (in hours) are

3,8,10,5,9,9,1,3,0,4

The distribution of the daily time spent is assumed to be normal, with mean ${\displaystyle \mu }$ and variance ${\displaystyle \sigma ^{2}}$ [4]. Also, based on the past data about the daily time spent, John assumes that the standard deviation of the distribution is ${\displaystyle \sigma =3}$.

(a) Construct a 95% confidence interval for ${\displaystyle \mu }$.

(b) According to John, the computer game addiction problem is serious among teenagers aged 14-16 if the average daily time spent on playing computer games is at least a quarter of a day, i.e. 6 hours, and is not serious otherwise. Can John be (95%) confident that the computer game addiction problem is (i) serious; (ii) not serious among teenagers aged 14-16, based on the 95% confidence interval in (a)?

(c) To be more certain about the time spent, John would like to construct a 99% confidence interval for ${\displaystyle \mu }$, with width not exceeding 1 hour. At least how many teenagers should be in the random sample to satisfy this requirement?

(d) Suppose John take another random sample from the population where the number of teenagers involved is the number suggested in (c). If ${\displaystyle {\overline {x}}=4.7}$ in this random sample, construct a 99% confidence interval for ${\displaystyle \mu }$, and verify that its width does not exceed 1 hour.

(e) Can John be (99%) confident that the computer game addiction problem is not serious among teenagers aged 14-16 based on the 99% confidence interval in (d)?

Solution:

(a) Since the realization of the sample mean is ${\displaystyle {\overline {x}}={\frac {3+8+10+5+9+9+1+3+0+4}{10}}=5.2}$, and ${\displaystyle z_{0.025}\approx 1.96}$, the 95% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[5.2-1.96{\frac {3}{\sqrt {10}}},5.2+1.96{\frac {3}{\sqrt {10}}}\right]\approx [3.34,7.06]}$.

(b) (i) No, since the confidence interval contains some values that are strictly less than 6 and some that are at least 6. Thus, although John is 95% confident that ${\displaystyle \mu }$ lies in ${\displaystyle [3.34,7.06]}$, it is uncertain that whether the time spent will be at least 6 when ${\displaystyle \mu }$ lies in ${\displaystyle [3.34,7.06]}$.

(b) (ii) No, and the reason is similar to that in (i) (it is uncertain that whether the time spent will be lower than 6 when ${\displaystyle \mu }$ lies in ${\displaystyle [3.34,7.06]}$).

(c) Since a 99% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[{\overline {x}}-z_{0.005}{\frac {3}{\sqrt {n}}},{\overline {x}}+z_{0.005}{\frac {3}{\sqrt {n}}}\right]}$, its width is ${\displaystyle 2z_{0.005}{\frac {3}{\sqrt {n}}}}$ (which is independent from ${\displaystyle {\overline {x}}}$). Also, we know that ${\displaystyle z_{0.005}\approx 2.57}$. Thus, to satisfy the requirement, we need to have ${\displaystyle 2(2.57){\frac {3}{\sqrt {n}}}\leq 1\implies n\geq (3(2)(2.57))^{2}\approx 237.776.}$ Since the sample size ${\displaystyle n}$ must be an integer, it follows that the minimum value of ${\displaystyle n}$ is 238. That is, at least 238 teenagers should be in the random sample to satisfy the requirement.

(d) A 99% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[4.7-2.57{\frac {3}{\sqrt {238}}},4.7+2.57{\frac {3}{\sqrt {238}}}\right]\approx [4.20023,5.199765]}$. Its width is approximately 0.999535, which is less than 1.

(e) Yes, since all values in the interval in (d) are strictly less than 6.

Exercise. Suppose John decides to take another random sample consisting of even more teenagers, 500 of them. If ${\displaystyle {\overline {x}}=5.8}$ in this random sample,

(a) Construct a 99% confidence interval for ${\displaystyle \mu }$.

(b) Can John be (99%) confident that the computer game addiction problem is not serious among teenagers aged 14-16 based on the 99% confidence interval in (a)?

Solution

(a) A 99% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[5.8-2.57{\frac {3}{\sqrt {500}}},5.8+2.57{\frac {3}{\sqrt {500}}}\right]\approx [5.4552,6.1448]}$.

(b) No, since some values in the interval are at least 6.

We have previously discussed a way to construct confidence interval for the mean when the variance is known. However, this is not always the case in practice. We may not know the variance, right? Then, we cannot use the ${\displaystyle \sigma }$ in the confidence interval from the previous theorem.

Intuitively, one may think that we can use the sample variance ${\displaystyle S^{2}}$ to "replace" the ${\displaystyle \sigma ^{2}}$, according to the weak law of large number. Then, we can simply replace the unknown ${\displaystyle \sigma }$ in the confidence interval by the known ${\displaystyle S}$ (or its realization ${\displaystyle s}$ for interval estimate). However, the flaw in this argument is that the sample size may not be large enough to apply the weak law of large number for approximation.

Remark.

• A rule of thumb is that we may regard the sample size is large enough for applying this kind of convergence theorem (e.g. weak law of large number and central limit theorem) for approximation, when the sample size is at least 30. Otherwise, the approximation is not accurate enough, i.e. the error can be quite large, and thus we should not use such theorem for approximation.

So, you may now ask that when the sample size is large enough, can we do such "replacement" for approximation. The answer is yes, and we will discuss in the last section about approximated confidence intervals.

Before that section, the confidence intervals discussed is exact in the sense that no approximation is used to construct them. Therefore, the confidence intervals constructed "work" for every sample size, no matter how large or how small it is (it works even if the sample size is 1, although such confidence interval constructed may not be very "nice", in the sense that the width of the interval may be quite large).

Before discussing how to construct an confidence interval for the mean when the variance is unknown, we first give some results that are useful for deriving such confidence interval.

Proposition. (Several properties about sample mean and variance) Let ${\displaystyle X_{1},\dotsc ,X_{n}}$ be a random sample from ${\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2})}$. Also let ${\displaystyle {\overline {X}}={\frac {\sum _{i=1}^{n}X_{i}}{n}}}$ be the sample mean and ${\displaystyle S^{2}={\frac {\sum _{i=1}^{n}(X_{i}^{2}-{\overline {X}})^{2}}{n}}}$ be the sample variance, where ${\displaystyle n}$ is the sample size. Then,

(i) ${\displaystyle {\overline {X}}}$ and ${\displaystyle S^{2}}$ are independent.

(ii) ${\displaystyle {\frac {nS^{2}}{\sigma ^{2}}}={\frac {\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}}$ where ${\displaystyle \chi _{n-1}^{2}}$ is a chi-squared distribution with ${\displaystyle n-1}$ degrees of freedom.

(iii) ${\displaystyle {\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}\sim t_{n-1}}$ where ${\displaystyle t_{n-1}}$ is a ${\displaystyle t}$-distribution with ${\displaystyle n-1}$ degrees of freedom.

Proof.

(i) One may use Basu's theorem to prove this, but the details about Basu's theorem and the proof are omitted here, since they are a bit complicated.

(ii) We will use the following definition of chi-squared distribution ${\displaystyle \chi _{k}^{2}}$ : ${\displaystyle \sum _{i=1}^{k}Z_{i}^{2}\sim \chi _{k}^{2}}$ where ${\displaystyle Z_{1},Z_{2},\dotsc ,Z_{k}\sim {\mathcal {N}}(0,1)}$ are independent. Also, we will use the fact that the mgf of ${\displaystyle \chi _{k}^{2}}$ is ${\displaystyle M(t)=(1-2t)^{-k/2},\quad t<{\frac {1}{2}}}$.

Now, first let ${\displaystyle W=\sum _{i=1}^{n}\left({\frac {X_{i}-\mu }{\sigma }}\right)^{2}}$ which follows ${\displaystyle \chi _{n}^{2}}$ since ${\displaystyle {\frac {X_{1}-\mu }{\sigma }},\dotsc ,{\frac {X_{n}-\mu }{\sigma }}\sim {\mathcal {N}}(0,1)}$ are independent. Then, we write ${\displaystyle W}$ as {\displaystyle {\begin{aligned}W&=\sum _{i=1}^{n}\left({\frac {X_{i}-\mu }{\sigma }}\right)^{2}\\&=\sum _{i=1}^{n}\left({\frac {X_{i}{\color {darkgreen}-{\overline {X}}}}{\sigma }}+{\frac {{\color {darkgreen}{\overline {X}}}-\mu }{\sigma }}\right)^{2}\\&=\sum _{i=1}^{n}\left({\frac {X_{i}{\color {darkgreen}-{\overline {X}}}}{\sigma }}\right)^{2}+\sum _{i=1}^{n}\left({\frac {{\color {darkgreen}{\overline {X}}}-\mu }{\sigma }}\right)^{2}+0&{\Bigg (}{\color {blue}2}\sum _{i=1}^{n}{\frac {{\color {blue}({\overline {X}}-\mu )}(X_{i}-{\overline {X}})}{\color {blue}\sigma ^{2}}}={\color {blue}{\frac {2({\overline {X}}-\mu )}{\sigma ^{2}}}}{\bigg (}\underbrace {\sum _{i=1}^{n}X_{i}} _{=n{\overline {X}}}-\underbrace {\sum _{i=1}^{n}\overbrace {\overline {X}} ^{{\text{constant wrt }}i}} _{=n{\overline {X}}}{\bigg )}=0{\Bigg )}\\&={\frac {1}{\sigma ^{2}}}\underbrace {\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}} _{=nS^{2}}+{\Big (}\underbrace {\frac {{\sqrt {n}}({\overline {X}}-\mu )}{\sigma }} _{\sim {\mathcal {N}}(0,1){\text{ by property}}}{\Big )}^{2}\\&={\frac {nS^{2}}{\sigma ^{2}}}+Z^{2}&\left(Z={\frac {{\sqrt {n}}({\overline {X}}-\mu )}{\sigma }}\sim {\mathcal {N}}(0,1)\right)\end{aligned}}} Applying the definition of chi-squared distribution, we have ${\displaystyle Z^{2}\sim \chi _{1}^{2}}$.

By (i), ${\displaystyle {\overline {X}}}$ and ${\displaystyle S^{2}}$ are independent. Thus, ${\displaystyle {\frac {nS^{2}}{\sigma ^{2}}}}$ (a function of ${\displaystyle S^{2}}$) is independent from ${\displaystyle Z^{2}}$ (a function of ${\displaystyle {\overline {X}}}$). Now, let ${\displaystyle U={\frac {nS^{2}}{\sigma ^{2}}}}$ and ${\displaystyle V=Z^{2}}$. Since ${\displaystyle U}$ and ${\displaystyle V}$ are independent, and also we have ${\displaystyle W=U+V}$ from above derivation, the mgf ${\displaystyle M_{W}(t)=M_{U+V}(t)=M_{U}(t)M_{V}(t).}$ Since ${\displaystyle W\sim \chi _{n}^{2}}$ and ${\displaystyle V\sim \chi _{1}^{2}}$, we can further write ${\displaystyle (1-2t)^{-n/2}=M_{U}(t)(1-2t)^{-1/2},\quad t<{\frac {1}{2}},}$ which implies that the mgf of ${\displaystyle U}$ is ${\displaystyle M_{U}(t)=(1-2t)^{-(n-1)/2},\quad t<{\frac {1}{2}}}$, which is exactly the mgf of ${\displaystyle \chi _{n-1}^{2}}$. Hence, ${\displaystyle U={\frac {nS^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}}$.

(iii) We will use the following definition of ${\displaystyle t}$-distribution ${\displaystyle t_{k}}$: ${\displaystyle {\frac {Z}{\sqrt {Y/k}}}\sim t_{k}}$ where ${\displaystyle Z\sim {\mathcal {N}}(0,1)}$, ${\displaystyle Y\sim \chi _{k}^{2}}$, and ${\displaystyle Z}$ and ${\displaystyle Y}$ are independent.

After using this definition, it is easy to prove (iii) with (ii), as follows: {\displaystyle {\begin{aligned}{\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}&={\frac {{\color {darkgreen}{\sqrt {n}}}({\overline {X}}-\mu )/{\color {darkgreen}\sigma }}{{\color {darkgreen}{\sqrt {n}}}S/({\color {darkgreen}\sigma }{\sqrt {n-1}})}}\\&={\frac {{\sqrt {n}}({\overline {X}}-\mu )/\sigma }{\sqrt {{\frac {nS^{2}}{\sigma ^{2}}}{\big /}n-1}}}.\end{aligned}}} By (ii), ${\displaystyle {\frac {nS^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}}$. Also, we know that ${\displaystyle {\frac {{\sqrt {n}}({\overline {X}}-\mu )}{\sigma }}}$ and ${\displaystyle {\frac {nS^{2}}{\sigma ^{2}}}}$ are independent since ${\displaystyle {\overline {X}}}$ and ${\displaystyle S^{2}}$ are independent by (i). Then, it follows by the above definition that ${\displaystyle {\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}\sim t_{n-1}}$.

${\displaystyle \Box }$

Using this proposition, we can prove the following theorem. Again, before discussing this confidence interval, let us introduce a notation:

• ${\displaystyle t_{\alpha ,\nu }}$ is the upper percentile of ${\displaystyle t_{\nu }}$ at level ${\displaystyle \alpha }$, i.e. it satisfies ${\displaystyle \mathbb {P} (T\geq t_{\alpha ,\nu })=\alpha }$ where ${\displaystyle T\sim t_{\nu }}$.

Theorem. (Confidence interval of ${\displaystyle \mu }$ when ${\displaystyle \sigma ^{2}}$ is unknown) Let ${\displaystyle X_{1},\dotsc ,X_{n}}$ be a random sample ${\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2})}$. When ${\displaystyle \sigma ^{2}}$ is unknown, a ${\displaystyle 1-\alpha }$ confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[{\overline {X}}-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}},{\overline {X}}+t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right].}$

Remark.

• The corresponding interval estimate is ${\displaystyle \left[{\overline {x}}-t_{\alpha /2,n-1}{\frac {s}{\sqrt {n-1}}},{\overline {x}}+t_{\alpha /2,n-1}{\frac {s}{\sqrt {n-1}}}\right]}$, with observed value ${\displaystyle {\overline {X}}={\overline {x}}}$ and ${\displaystyle S=s}$ (sample standard deviation ${\displaystyle S}$ is nonnegative. Thus, this is equivalent to ${\displaystyle S^{2}=s^{2}}$).
• We can find values of ${\displaystyle t_{\alpha ,\nu }}$ for some values of ${\displaystyle \alpha }$ and ${\displaystyle \nu }$ from "${\displaystyle t}$-table"
• In this "${\displaystyle t}$-table", the first column indicates the value of ${\displaystyle \nu }$, and the first row (one-sided) indicates ${\displaystyle 1-\alpha }$ (it is "one-sided" since in our definition of ${\displaystyle t_{\alpha ,\nu }}$, "${\displaystyle T\geq t_{\alpha ,\nu }}$" is involved, which is "one-sided". For instance, if we want to get ${\displaystyle t_{0.05,\nu }}$, we can look at ${\displaystyle 1-0.05=95\%}$ in the first row (one-sided).
• Alternatively, we can look at the second row (two-sided) which indicates the confidence coefficient of the confidence interval (${\displaystyle 1-\alpha }$), corresponding to ${\displaystyle t_{\alpha /2,\nu }}$. For instance, if we want to get ${\displaystyle t_{0.05/2,\nu }}$, we can look at ${\displaystyle 1-0.05=95\%}$ in the second row (two-sided).
• When ${\displaystyle \nu \to \infty }$, the ${\displaystyle t}$-distribution ${\displaystyle t_{n}}$ tends to the standard normal distribution ${\displaystyle {\mathcal {N}}(0,1)}$. Hence, when ${\displaystyle \nu }$ is large, ${\displaystyle t_{\alpha ,\nu }\approx z_{\alpha }}$. Thus, if one cannot find the value of ${\displaystyle t_{\alpha ,\nu }}$ from ${\displaystyle t}$-table since ${\displaystyle \nu }$ is so large that it does not appear at the table, then one can simply get ${\displaystyle z_{\alpha }}$ from the standard normal table for an approximation.

Proof. By (iii) in the previous proposition, we have ${\displaystyle T={\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}\sim t_{n-1}}$. Since ${\displaystyle t_{n-1}}$ is independent from ${\displaystyle \mu }$, ${\displaystyle T}$ is a pivotal quantity of ${\displaystyle \mu }$. Hence, we set ${\displaystyle 1-\alpha =1-\mathbb {P} (T\geq t_{\alpha /2,n-1}))-\mathbb {P} (T\leq -t_{\alpha /2,n-1})=\mathbb {P} (-t_{\alpha /2,n-1}\leq T\leq t_{\alpha /2,n-1})}$ where ${\displaystyle t_{\alpha /2,n-1}}$ is a constant (${\displaystyle t}$-distribution is symmetric (about ${\displaystyle x=0}$), so we have ${\displaystyle \mathbb {P} (T\leq -t_{\alpha /2,n-1})=\alpha /2}$). It follows that {\displaystyle {\begin{aligned}1-\alpha &=\mathbb {P} (-t_{\alpha /2,n-1}\leq T\leq t_{\alpha /2,n-1})\\&=\mathbb {P} \left(-t_{\alpha /2,n-1}\leq {\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}\leq t_{\alpha /2,n-1}\right)\\&=\mathbb {P} \left(-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\leq {\overline {X}}-\mu \leq t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\\&=\mathbb {P} \left(t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\geq \mu -{\overline {X}}\geq -t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\\&=\mathbb {P} \left(-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\leq \mu -{\overline {X}}\leq t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)&({\text{rewrite}})\\&=\mathbb {P} \left({\overline {X}}-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\leq \mu \leq {\overline {X}}+t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right).\\\end{aligned}}} The result follows.

${\displaystyle \Box }$

Example. A government officer of country A would like to know the daily average time spent on exercises of all citizens in country A. Suppose the variance of the time spent is unknown, and a random sample of 10 citizens are taken from the population. The following is the time spent on exercises in a particular day for the citizens in that sample (in minutes):

10, 0, 60, 20, 30, 30, 120, 40, 30, 10.

Assuming the time spent follows normal distribution, construct a 95% confidence interval for the daily average time spent on exercises of all citizens in country A, denoted by ${\displaystyle \mu }$.

Solution: First, we have ${\displaystyle {\overline {x}}={\frac {10+0+60+20+30+30+120+40+30+10}{10}}=35}$, and ${\displaystyle s={\sqrt {\frac {(10-35)^{2}+(0-35)^{2}+(60-35)^{2}+(20-35)^{2}+(30-35)^{2}+(30-35)^{2}(120-35)^{2}+(40-35)^{2}+(30-35)^{2}+(10-35)^{2}}{10}}}={\sqrt {1065}}\approx 32.634}$.

Also, ${\displaystyle t_{0.025,9}\approx 2.262}$ from "97.5% (one-sided) and 9" (or "95% (two-sided) and 9") in ${\displaystyle t}$-table.

Thus, a 99% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[35-2.262\cdot {\frac {32.634}{\sqrt {9}}},35+2.262\cdot {\frac {32.634}{\sqrt {9}}}\right]\approx [10.39,59.61].}$.

Exercise. The government officer also want to know the mean monthly wage of all citizens in country A, ${\displaystyle \mu }$. Suppose the standard deviation of the monthly wage is 2000 (all wages in this example are in USD). From a salary survey which asks for 15 citizens for their monthly wages, the following monthly wages (in USD) are obtained:

1500, 3000, 1200, 4000, 3500, 10000, 5000, 1000, 6000, 3000, 2000, 2000, 1500, 3000, 8000.

(a) Construct a 90% confidence interval for the mean monthly wage ${\displaystyle \mu }$, assuming the underlying distribution for the wage is normal.

(b) For the salary survey, it is found that a respondent gives a wrong monthly wage: he enters one more "0" accidentally, and thus answers 10000 instead of 1000. Thus, after the correction, the corrected sample data of the monthly wages is:

1500, 3000, 1200, 4000, 3500, 1000, 5000, 1000, 6000, 3000, 2000, 2000, 1500, 3000, 8000.

Update the confidence interval in (a) to a correct one, based on this correct data.

Solution

(a) First, we can get ${\displaystyle {\overline {x}}\approx 3646.67}$, and ${\displaystyle s\approx 2526.09}$. Also, ${\displaystyle t_{0.05,14}\approx 1.761}$ (from "95% (one-sided) (or 90% (two-sided)) and 14" in ${\displaystyle t}$-table).

Hence, a 90% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[3646.67-1.761\cdot {\frac {2526.09}{\sqrt {14}}},3646.67+1.761\cdot {\frac {2526.09}{\sqrt {14}}}\right]\approx [2457.77,4835.57]}$

(b) First, we update ${\displaystyle {\overline {x}}}$ and ${\displaystyle s}$: ${\displaystyle {\overline {x}}\approx 3046.67}$ and ${\displaystyle s\approx 1948.63}$. Then, a new 90% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[{\color {darkgreen}3046.67}-1.761\cdot {\frac {\color {darkgreen}1948.63}{\sqrt {14}}},{\color {darkgreen}3046.67}+1.761\cdot {\frac {\color {darkgreen}1948.63}{\sqrt {14}}}\right]\approx [2129.55,3963.79]}$

Example.

A farmer Tom owns an apple orchard. He just harvests a large amount of apples (1000 apples) from his orchard. To access the "quality" of this batch of apples, he wants to know the mean weight of the apples in this batch, ${\displaystyle \mu }$. However, since there are too many apples, it is cumbersome to weigh every apple in this batch. Hence, Tom decides to take a random sample of 5 apples, and use them to roughly estimate the mean weight of the apples. The following is the weight of the apples in that sample (in g):

100, 120, 200, 220, 80.

Assume the distribution of the weight is normal.

(a) Based on past experiences, Tom knows that the standard deviation of the weight of the apples is 30g. Construct a 95% confidence interval for ${\displaystyle \mu }$.

(b) Tom finds out that in this batch, the apples grown are of new kind, that have not been grown before. Therefore, the standard deviation of the weight based on past experiences cannot be applied to estimation of the mean weight for this batch. Hence, the standard deviation of the weight is now unknown. Construct an updated 95% confidence interval for ${\displaystyle \mu }$.

Solution:

(a) We have ${\displaystyle {\overline {x}}=144}$. Also, ${\displaystyle z_{0.025}\approx 1.96}$ from standard normal table. Hence, a 95% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[144-1.96\cdot {\frac {30}{\sqrt {5}}},144+1.96\cdot {\frac {30}{\sqrt {5}}}\right]\approx [117.70,170.30].}$

(b) We have ${\displaystyle s\approx 55.71}$, and ${\displaystyle t_{0.025,4}\approx 2.776}$ from ${\displaystyle t}$-table. Hence, a 95% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle \left[144-2.776\cdot {\frac {55.71}{\sqrt {4}}},144+2.776\cdot {\frac {55.71}{\sqrt {4}}}\right]\approx [66.67,221.33].}$

Exercise. Tom sells this batch of apple to a nearby shop, and it is known that the shop will pay Tom ${\displaystyle 0.1\mu }$ in USD for each apple, where ${\displaystyle \mu }$ is the mean weight of the batch of apples.

(a) Construct a 95% confidence interval for the total revenue of Tom from this transaction (in USD), ${\displaystyle r}$, based on the above confidence interval in (b) of example.

(b) Suppose the cost for Tom to grow this batch of apples is USD 6000. Can Tom be 95% confident that he can earn a positive profit (i.e. the revenue exceeds the cost) from this transaction.

Solution

(a) Since ${\displaystyle r=1000(0.1\mu )=100\mu }$, and a 95% confidence interval for ${\displaystyle \mu }$ is ${\displaystyle [66.67,221.33]}$ based on (b). From the construction of confidence interval, we have ${\displaystyle 1-\alpha =\mathbb {P} \left({\overline {X}}-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\leq \mu \leq {\overline {X}}+t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\implies 1-\alpha =\mathbb {P} \left(100\left({\overline {X}}-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\leq r\leq 100\left({\overline {X}}+t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\right).}$ Hence, the corresponding confidence interval for ${\displaystyle r}$ is (approximately) ${\displaystyle [6667,22133].}$

(b) Yes, since Tom can be 95% confident that ${\displaystyle r}$ lies in ${\displaystyle [6667,22133]}$, which exceeds the cost USD 6000.

### Difference in means of two normal distributions

Sometimes, apart from estimating mean of a single normal distribution, we would like to estimate the difference in means of two normal distributions for making comparison. For example, apart from estimating the mean amount of time (lifetime) for a bulb until it burns out, we are often interested in estimating the difference between life of two different bulbs, so that we know which of the bulbs will last longer in average, and then we know that bulb has a higher "quality".

First, let us discuss the case where the two normal distributions are independent.

Now, the problem is that how should we construct a confidence interval for the difference in two means. It seems that we can just construct two ${\displaystyle 1-\alpha }$ confidence intervals ${\displaystyle [L(\mathbf {X} ),U(\mathbf {X} )],[L(\mathbf {Y} ),U(\mathbf {Y} )]}$ for each of the two means ${\displaystyle \mu _{X},\mu _{Y}}$ respectively. Then, the ${\displaystyle 1-\alpha }$ confidence interval for ${\displaystyle \mu _{X}-\mu _{Y}}$ is ${\displaystyle [L(\mathbf {X} )-L(\mathbf {Y} ),U(\mathbf {X} )-U(\mathbf {Y} )]}$. However, this is indeed incorrect since when we have ${\displaystyle \mathbb {P} (L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} ))=1-\alpha }$ and ${\displaystyle \mathbb {P} (L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} ))=1-\alpha }$, it does not mean that ${\displaystyle \mathbb {P} (L(\mathbf {X} )-L(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-U(\mathbf {Y} ))=1-\alpha }$ (there are no results in probability that justify this).

On the other hand, it seems that since ${\displaystyle \{L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} )\}}$ and ${\displaystyle \{L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )\}}$ are independent (since the normal distributions we are considering are independent), then we have ${\displaystyle \mathbb {P} (L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} ){\text{ and }}L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} ))=(1-\alpha )^{2}.}$ Then, when ${\displaystyle L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} )}$ and ${\displaystyle L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )}$, we have ${\displaystyle L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} ),}$ so ${\displaystyle \mathbb {P} (L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} ))=(1-\alpha )^{2},}$ which means ${\displaystyle [L(\mathbf {X} )-U(\mathbf {Y} ),U(\mathbf {X} )-L(\mathbf {Y} )]}$ is a ${\displaystyle (1-\alpha )^{2}}$ confidence interval.

However, this is actually also incorrect. The flaw is that "when ${\displaystyle L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} )}$ and ${\displaystyle L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )}$, we have ${\displaystyle L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} )}$" only means ${\displaystyle \{L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} ){\text{ and }}L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )\}\subseteq \{L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} )\}}$ (we do not have the reverse subset inclusion in general). This in turn means ${\displaystyle (1-\alpha )^{2}=\mathbb {P} (L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} ){\text{ and }}L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )){\color {darkgreen}\leq }\mathbb {P} (L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} )).}$ So, ${\displaystyle [L(\mathbf {X} )-U(\mathbf {Y} ),U(\mathbf {X} )-L(\mathbf {Y} )]}$ is actually not a ${\displaystyle (1-\alpha )^{2}}$ confidence interval (in general).

So, the above two "methods" to construct confidence intervals for difference in means of two independent normal distributions actually do not work. Indeed, we do not use the confidence interval for each of the two means, which is constructed previously, to construct a confidence interval for difference in the two means. Instead, we consider a pivotal quantity of the difference in the two means, which is a standard way for constructing confidence intervals.

Theorem. (Confidence interval of ${\displaystyle \mu _{X}-\mu _{Y}}$ when ${\displaystyle \sigma _{X}^{2}}$ and ${\displaystyle \sigma _{Y}^{2}}$ is known) Let ${\displaystyle X_{1},\dotsc ,X_{\color {darkgreen}n}}$ and ${\displaystyle Y_{1},\dotsc ,Y_{\color {darkgreen}m}}$ be a random sample from two independent distributions ${\displaystyle {\mathcal {N}}(\mu _{X},\sigma _{X}^{2})}$ and ${\displaystyle {\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})}$ (i.e. the random variables ${\displaystyle X\sim {\mathcal {N}}(\mu _{X},\sigma _{X}^{2})}$ and ${\displaystyle Y\sim {\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})}$ are independent) respectively, where ${\displaystyle \sigma _{X}^{2}}$ and ${\displaystyle \sigma _{Y}^{2}}$ are known. Then, a ${\displaystyle 1-\alpha }$ confidence interval for ${\displaystyle \mu _{X}-\mu _{Y}}$ is ${\displaystyle \left[({\overline {X}}-{\overline {Y}})-z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}},({\overline {X}}-{\overline {Y}})+z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\right]}$

Remark.

• The corresponding interval estimate is ${\displaystyle \left[({\overline {x}}-{\overline {y}})-z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}},({\overline {x}}-{\overline {y}})+z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\right]}$ with observed values ${\displaystyle {\overline {X}}={\overline {x}}}$ and ${\displaystyle {\overline {Y}}={\overline {y}}}$.

Exercise. Show that ${\displaystyle {\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}\sim {\mathcal {N}}(0,1)}$ (the meaning of the notations follows the above theorem).

Solution

Proof. First, we have ${\displaystyle {\overline {X}}\sim {\mathcal {N}}(\mu _{X},\sigma _{X}^{2}/n)}$ and ${\displaystyle {\overline {Y}}\sim {\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2}/m)}$ by property of normal distribution (${\displaystyle X_{1},\dotsc ,X_{n}}$, and ${\displaystyle Y_{1},\dotsc ,Y_{m}}$ are independent random samples). Then, applying the property of normal distribution again (the two distributions ${\displaystyle {\mathcal {N}}(\mu _{X},\sigma _{X}^{2})}$ and ${\displaystyle {\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})}$ are independent, and hence ${\displaystyle {\overline {X}}}$ and ${\displaystyle {\overline {Y}}}$ are independent), we have ${\displaystyle {\overline {X}}-{\overline {Y}}\sim {\mathcal {N}}(\mu _{X}-\mu _{Y},\sigma _{X}^{2}/n+(-1)^{2}\sigma _{Y}^{2}/m)\equiv {\mathcal {N}}(\mu _{X}-\mu _{Y},\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m).}$ It follows by applying the property again that ${\displaystyle {\frac {({\overline {X}}-{\overline {Y}}){\color {blue}-(\mu _{X}-\mu _{Y})}}{\color {red}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}}\sim {\mathcal {N}}\left({\frac {(\mu _{X}-\mu _{Y}){\color {blue}-(\mu _{X}-\mu _{Y})}}{\color {red}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}},{\frac {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}{\color {red}\left({\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}\right)^{2}}}\right)\equiv {\mathcal {N}}(0,1).}$

${\displaystyle \Box }$

Now, we will prove the above theorem based on the result shown in the previous exercise:

Proof. Let ${\displaystyle Z={\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}\sim {\mathcal {N}}(0,1)}$ (from the previous exercise). Then, ${\displaystyle Z}$ is a pivotal quantity of ${\displaystyle \mu _{X}-\mu _{Y}}$. Hence, we have {\displaystyle {\begin{aligned}1-\alpha &=\mathbb {P} (-z_{\alpha /2}\leq Z\leq z_{\alpha /2})\\&=\mathbb {P} \left(-z_{\alpha /2}\leq {\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}\leq z_{\alpha /2}\right)\\&=\mathbb {P} \left(-z_{\alpha /2}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}\leq ({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})\leq z_{\alpha /2}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/n}}\right)\\&=\mathbb {P} \left(z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\geq (\mu _{X}-\mu _{Y})-({\overline {X}}-{\overline {Y}})\geq -z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\right)\\&=\mathbb {P} \left(({\overline {X}}-{\overline {Y}})-z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\leq \mu _{X}-\mu _{Y}\leq ({\overline {X}}-{\overline {Y}})+z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\right).\\\end{aligned}}}

${\displaystyle \Box }$

Example. A statistician wants to compare two kinds of light bulbs (brand A vs. brand B) by their lifetime (amount of time until the bulb burns out). He takes a random sample of 10 light bulbs from the light bulbs of each of the brands, and measure their lifetime. The following is the summary of the results: ${\displaystyle {\begin{array}{cc}{\text{Brand}}&{\text{Sample mean (in hours)}}\\\hline A&4000\\B&4200\\\end{array}}}$ Based on past studies, the statistician knows that the standard deviation of the lifetime for brand A light bulb and brand B light bulb is 600 hours and 150 hours respectively. Assume the distribution of the lifetime is normal.

(a) Construct a 95% confidence interval for the mean lifetime of brand A light bulb (${\displaystyle \mu _{A}}$) and brand B light bulb (${\displaystyle \mu _{B}}$) respectively.

(b) Construct a 95% confidence interval for ${\displaystyle \mu _{B}-\mu _{A}}$.

(c) Can the statistician conclude with 95% confidence that brand B light bulb has a longer lifetime than brand A light bulb on average?

Solution.

(a) Since ${\displaystyle z_{0.025}\approx 1.96}$ and the sample size for each of the random samples is 10, a 95% confidence interval for ${\displaystyle \mu _{A}}$ is ${\displaystyle \left[4000-1.96\cdot {\frac {600}{\sqrt {10}}},4000+1.96\cdot {\frac {600}{\sqrt {10}}}\right]\approx [3628.116,4371.884],}$ and a 95% confidence interval for ${\displaystyle \mu _{B}}$ is ${\displaystyle \left[4200-1.96\cdot {\frac {150}{\sqrt {10}}},4200+1.96\cdot {\frac {150}{\sqrt {10}}}\right]\approx [4107.029,4292.971].}$

(b) A 95% confidence interval for ${\displaystyle \mu _{B}-\mu _{A}}$ is ${\displaystyle \left[(4200-4000)-1.96{\sqrt {{\frac {600}{10}}+{\frac {150}{10}}}},(4200-4000)+1.96{\sqrt {{\frac {600}{10}}+{\frac {150}{10}}}}\right]\approx [183.026,216.974].}$

(c) Since all values in the 95% confidence interval in (b) are positive, it means the statistician can be 95% confident that mean lifetime of brand B light bulb is longer than brand A light bulb.

Remark.

• Notice that some values in the 95% confidence interval for ${\displaystyle \mu _{A}}$ exceed all values in the 95% confidence interval for ${\displaystyle \mu _{B}}$. However, we are still 95% confident that ${\displaystyle \mu _{B}}$ exceeds ${\displaystyle \mu _{A}}$.

Exercise. Suppose there is a brand C light bulb, and the statistician also takes a random sample of 10 light bulbs from brand C light bulbs. It is observed that the sample mean of this random sample is 4210 hours, and the standard deviation of brand C light bulbs is a known to be ${\displaystyle \sigma _{C}}$ hours. Assume the distribution of the lifetime is normal.

After constructing 95% confidence intervals using the above theorem, the statistician is 95% confident that the brand C light bulb has a longer or same lifetime than both brand A and B light bulbs on average. Show that the maximum value of ${\displaystyle \sigma _{C}}$ is (approximately) 110.31.

Solution

Proof. Let ${\displaystyle \mu _{C}}$ be the mean lifetime of brand C light bulb.

A 95% confidence interval for ${\displaystyle \mu _{C}-\mu _{A}}$ is ${\displaystyle \left[(4210-4000)-1.96{\sqrt {{\frac {600}{10}}+{\frac {\sigma _{C}}{10}}}},(4210-4000)+1.96{\sqrt {{\frac {600}{10}}+{\frac {\sigma _{C}}{10}}}}\right],}$ and a 95% confidence interval for ${\displaystyle \mu _{C}-\mu _{B}}$ is ${\displaystyle \left[(4210-4200)-1.96{\sqrt {{\frac {150}{10}}+{\frac {\sigma _{C}}{10}}}},(4210-4200)+1.96{\sqrt {{\frac {150}{10}}+{\frac {\sigma _{C}}{10}}}}\right].}$ In order for the statistician to be 95% confident that the brand C light bulb has a longer or same lifetime than both brand A and B light bulbs, the lower bound of both of these confidence intervals should be at least 0, i.e. ${\displaystyle {\begin{cases}(4210-4000)-1.96{\sqrt {{\frac {600}{10}}+{\frac {\sigma _{C}}{10}}}}\geq 0\\(4210-4200)-1.96{\sqrt {{\frac {150}{10}}+{\frac {\sigma _{C}}{10}}}}\geq 0\\\end{cases}}\implies {\begin{cases}210\geq 1.96{\sqrt {{\frac {600}{10}}+{\frac {\sigma _{C}}{10}}}}\\10\geq 1.96{\sqrt {{\frac {150}{10}}+{\frac {\sigma _{C}}{10}}}}\\\end{cases}}\implies {\begin{cases}\sigma _{C}\leq 114195.92\\\sigma _{C}\leq 110.31\\\end{cases}}.}$ Hence, the maximum value of ${\displaystyle \sigma _{C}}$ is 110.31.

${\displaystyle \Box }$

Now, we will consider the case where the variances are unknown. In this case, the construction of the confidence interval for the difference in means is more complicated, and even more complicated when ${\displaystyle \sigma _{X}^{2}\neq \sigma _{Y}^{2}}$. Thus, we will only discuss the case where ${\displaystyle \sigma _{X}^{2}=\sigma _{Y}^{2}}$ is unknown. As you may expect, we will also use some results mentioned previously for constructing confidence interval for ${\displaystyle \mu }$ when ${\displaystyle \sigma ^{2}}$ is unknown in this case.

Theorem. (Confidence interval of ${\displaystyle \mu _{X}-\mu _{Y}}$ when ${\displaystyle \sigma _{X}^{2}=\sigma _{Y}^{2}=\sigma ^{2}}$ is unknown) Let ${\displaystyle X_{1},\dotsc ,X_{\color {darkgreen}n}}$ and ${\displaystyle Y_{1},\dotsc ,Y_{\color {darkgreen}m}}$ be a random sample from two independent distributions ${\displaystyle {\mathcal {N}}(\mu _{X},{\color {darkgreen}\sigma ^{2}})}$ and ${\displaystyle {\mathcal {N}}(\mu _{Y},{\color {darkgreen}\sigma ^{2}}}$) respectively. Then, a ${\displaystyle 1-\alpha }$ confidence interval for ${\displaystyle \mu _{X}-\mu _{Y}}$ is ${\displaystyle \left[({\overline {X}}-{\overline {Y}})-t_{\alpha /2,n+m-2}{\sqrt {{\frac {nS_{X}^{2}+mS_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}},({\overline {X}}-{\overline {Y}})+t_{\alpha /2,n+m-2}{\sqrt {{\frac {nS_{X}^{2}+mS_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}}\right]}$ where ${\displaystyle S_{X}^{2}}$ and ${\displaystyle S_{Y}^{2}}$ are the sample variance of the random sample ${\displaystyle X_{1},\dotsc ,X_{\color {darkgreen}n}}$ and ${\displaystyle Y_{1},\dotsc ,Y_{\color {darkgreen}m}}$ respectively.

Remark.

• The corresponding interval estimate is ${\displaystyle \left[({\overline {x}}-{\overline {y}})-t_{\alpha /2,n+m-2}{\sqrt {{\frac {ns_{X}^{2}+ms_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}},({\overline {x}}-{\overline {y}})+t_{\alpha /2,n+m-2}{\sqrt {{\frac {ns_{X}^{2}+ms_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}}\right]}$, with observed values ${\displaystyle {\overline {X}}={\overline {x}},{\overline {Y}}={\overline {y}},S_{X}=s_{x},{\text{ and }}S_{Y}=s_{y}}$.

Proof. Let ${\displaystyle Z={\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma ^{2}/n+\sigma ^{2}/m}}}\sim {\mathcal {N}}(0,1)}$ (the reason for this to follow ${\displaystyle {\mathcal {N}}(0,1)}$ is shown in a previous exercise). From a previous result, we know that ${\displaystyle V={\frac {nS_{X}^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}}$ and ${\displaystyle W={\frac {mS_{Y}^{2}}{\sigma ^{2}}}\sim \chi _{m-1}^{2}}$. Then, we know that the mgf of ${\displaystyle V}$ is ${\displaystyle M_{V}(t)=(1-2t)^{-(n-1)/2}}$ and the mgf of ${\displaystyle W}$ is ${\displaystyle M_{W}(t)=(1-2t)^{-(m-1)/2}}$. Since the distributions ${\displaystyle {\mathcal {N}}(\mu _{X},\sigma ^{2})}$ and ${\displaystyle {\mathcal {N}}(\mu _{Y},\sigma ^{2})}$ are independent, the mgf of ${\displaystyle U=V+W}$ is ${\displaystyle M_{U}(t)=M_{V+W}(t)=M_{V}(t)M_{W}(t)=(1-2t)^{-(n-1)/2-(m-1)/2}=(1-2t)^{-(n+m-2)/2}.}$ Hence, ${\displaystyle W\sim \chi _{n+m-2}^{2}}$.

By the independence of sample mean and sample variance (${\displaystyle {\overline {X}}}$ and ${\displaystyle S_{X}^{2}}$ are independent, ${\displaystyle {\overline {Y}}}$ and ${\displaystyle S_{Y}^{2}}$ are independent), we can deduce that ${\displaystyle Z}$ and ${\displaystyle U}$ are independent. Thus, by the definition of ${\displaystyle t}$-distribution,