Confidence Intervals for Normal Data

Introduction

On its own, a point estimate like $\bar{x}=2.2$ carries no information about its accuracy; it's just a single number, regardless of wheter its based on ten data points or one million data points.

For this reason, statisticians augment point estimates with confidence intervals. For example, to estimate unknown mean $\mu$ we might be able to say that our best estimate of the mean is $\bar{x}=2.2$ with a $95\%$ confidence interval $[1.2, 3.2]$ .

Interval Statistics

Recall that our working definition of a statistic is anything that be computed from data. In particular, the formula for a statistic cannot include unknown quantities.

Technically an interval statistic is nothing more than a pair of point statistics giving the lower and upper bounds of the interval. Our reason for emphasizing that the interval is a statitic is to highlight the folliwng:

The interval is random - new random data will produce a new interval.
As frequentists we are perfectly happy uisng it because it doesn't depend on the value of an unknown parameter or hypothesis.
Be careful in you thinking about these probabilities. Confidence intervals are a frequentist notion. Since frequentists do not compute probabilities of hypotheses, the confidence level is never a probability that the unknown parameter is in the confidence level.

Example) 사과농장의 사과 수확량 Q1) 사과 농장에서 한 해 20만개의 사과를 수확하였다. 올해 수확한 사과 20만개의 평균무게를 알고 싶어서 20만개의 사과 중 36개를 무작위로 뽑아서 무게를 쟀다. Sample 로 뽑은 36개 사과의 평균무게는 112g이고 표준편차는 40g이다. (사과의 무게는 정규분포를 따른다고 가정함) 사과무게 모평균의 95% 신뢰구간을 구하면?

Q2) 사과를 무작위로 36개를 다시 뽑을 경우 첫 번째 36개와 같은 샘플평균이 나오게 될까?

NO!!!. 즉, 추출된 표본이 정해진 개념이 아니듯 신뢰구간 또한 명확히 정해지는 개념이 아님.

따라서, 신뢰구간 95%의 정확한 의미는 '같은 방법으로 100번 표본을 추출했을 때, 함계 계산되는 100개의 신뢰구간 중 모평균을 포함한 신뢰구간들의 숫자가 95개 정도 된다.'는 의미이다.

$z$ Confidence Interval for the Mean

Throughout this page, we will assume that we have normally distributed data:

x_1,\ x_2, . . .,\ x_n \sim N(\mu, \sigma^2)

Ref) Critical Value, Student-t critical values

Definition of $z$ confidence intervals for the mean

Suppose the data behaves normal distribution, with unknown mean $\mu$ and known variance $\sigma^2$ . The $(1-\alpha)$ confidence interval for $\mu$ is

For example, if $\alpha=0.05$ then $z_{\alpha/2}=1.96$ so the 0.95(or 95%) confidence interval is

Example) Suppose we collect 100 data points from $N(\mu, 3^2)$ distribution and the sample mean is $\bar{x}=12$ . Give the 95% confidence interval for $\mu$ .

Rejection regions

Example) Suppose that $n=12$ data points are drawn from $N(\mu, 5^2)$ where $\mu$ is unknown. Set up a two-sided significance test of $H_0 :\mu=2.71$ using the statistic $\bar{x}$ at significance level $\alpha=0.05$ . Describe the rejection and non-rejection regions.

Manipulating intervals: pivoting

Here is a quick summary of intervals $\bar{x}$ and $\mu_0$ and what is called pivoting. Pivioting is the idea the $\bar{x}$ is in $\mu_0 \pm a$ says exactly the same thing as $\mu_0$ is in $\bar {x} \pm a$ .

We make a few observations about this confidence interval.

It only depnds on $\bar {x}$ , so it is a statistic.
The significance level $\alpha=0.05$ means that, assuming the null hypothesis that $\mu=2.71$ is ture, random data will lead us to reject the null hypothesis 5% of the time (a Type 1 error).
Again assuming that $\mu=2.71$ , then 5% of the time the confidence interval will not contain 2.71, and conversely, 95% of the time it will contain 2.71.

Example) Suppose the data 2.5, 5.5, 8.5, 11.5 was drawn from a $N(\mu, 10^2)$ distribution with unkown mean $\mu$ .

(a) Compute the point estimate $\bar{x}$ for $\mu$ and the corresponding 50%, 80%, 95% confidence intervals. (b) Consider the null hypothesis $\mu=1$ . Would you reject $H_0$ at $\alpha=0.05$ ? $\alpha=0.20$ ? $\alpha=0.50$ ? Do these two ways: first by checking if the hypothesized value of $\mu$ is in the relevant confidence intervals and second by constructing a rejection region.

$t$ -confidence Interval for the mean

This will nearly identical to normal confidence intervals. In this setting $\sigma$ is not known, so we have to make the following replacements.

Use $s_{\bar x}= {s \over \sqrt n}$ instead of $\sigma_{\bar x}= {\sigma \over \sqrt n}$ . Here $s$ is the sample variance we used before in t-tests.
Use t-critical values instead of z-critical values.

Chi-square Confidence Interval for the Variance

Ref) Chi-square confidence interval

We now turn to an interval estimate for the unknown variance.

Definition: Suppose the data $x_1,\ x_2, . . .,\ x_n \sim N(\mu, \sigma^2)$ with $\mu$ and $\sigma$ both unknown. The $(1-\alpha)$ confidence interval for the variance $\sigma ^2$ is

Here $c_{\alpha/2}$ is the right critical value $P(X^2>c_{\alpha/2})=\alpha/2$ for $X^2 \sim \chi^2(n-1)$ and $s^2$ is the sample variance of the data.

PreviousComparison of Bayesian and frequentist inference NextFrequenist Methods

Last updated 3 years ago

Was this helpful?