Confidence Intervals for Normal Data
Last updated
Last updated
On its own, a point estimate like carries no information about its accuracy; it's just a single number, regardless of wheter its based on ten data points or one million data points.
For this reason, statisticians augment point estimates with confidence intervals. For example, to estimate unknown mean we might be able to say that our best estimate of the mean is with a confidence interval .
Recall that our working definition of a statistic is anything that be computed from data. In particular, the formula for a statistic cannot include unknown quantities.
Technically an interval statistic is nothing more than a pair of point statistics giving the lower and upper bounds of the interval. Our reason for emphasizing that the interval is a statitic is to highlight the folliwng:
The interval is random - new random data will produce a new interval.
As frequentists we are perfectly happy uisng it because it doesn't depend on the value of an unknown parameter or hypothesis.
Be careful in you thinking about these probabilities. Confidence intervals are a frequentist notion. Since frequentists do not compute probabilities of hypotheses, the confidence level is never a probability that the unknown parameter is in the confidence level.
Example) 사과농장의 사과 수확량 Q1) 사과 농장에서 한 해 20만개의 사과를 수확하였다. 올해 수확한 사과 20만개의 평균무게를 알고 싶어서 20만개의 사과 중 36개를 무작위로 뽑아서 무게를 쟀다. Sample 로 뽑은 36개 사과의 평균무게는 112g이고 표준편차는 40g이다. (사과의 무게는 정규분포를 따른다고 가정함) 사과무게 모평균의 95% 신뢰구간을 구하면?
Q2) 사과를 무작위로 36개를 다시 뽑을 경우 첫 번째 36개와 같은 샘플평균이 나오게 될까?
NO!!!. 즉, 추출된 표본이 정해진 개념이 아니듯 신뢰구간 또한 명확히 정해지는 개념이 아님.
따라서, 신뢰구간 95%의 정확한 의미는 '같은 방법으로 100번 표본을 추출했을 때, 함계 계산되는 100개의 신뢰구간 중 모평균을 포함한 신뢰구간들의 숫자가 95개 정도 된다.'는 의미이다.
Throughout this page, we will assume that we have normally distributed data:
Suppose the data behaves normal distribution, with unknown mean and known variance . The confidence interval for is
For example, if then so the 0.95(or 95%) confidence interval is
Example) Suppose we collect 100 data points from distribution and the sample mean is . Give the 95% confidence interval for .
Example) Suppose that data points are drawn from where is unknown. Set up a two-sided significance test of using the statistic at significance level . Describe the rejection and non-rejection regions.
Here is a quick summary of intervals and and what is called pivoting. Pivioting is the idea the is in says exactly the same thing as is in .
We make a few observations about this confidence interval.
It only depnds on , so it is a statistic.
The significance level means that, assuming the null hypothesis that is ture, random data will lead us to reject the null hypothesis 5% of the time (a Type 1 error).
Again assuming that , then 5% of the time the confidence interval will not contain 2.71, and conversely, 95% of the time it will contain 2.71.
Example) Suppose the data 2.5, 5.5, 8.5, 11.5 was drawn from a distribution with unkown mean .
(a) Compute the point estimate for and the corresponding 50%, 80%, 95% confidence intervals. (b) Consider the null hypothesis . Would you reject at ? ? ? Do these two ways: first by checking if the hypothesized value of is in the relevant confidence intervals and second by constructing a rejection region.
This will nearly identical to normal confidence intervals. In this setting is not known, so we have to make the following replacements.
Use instead of . Here is the sample variance we used before in t-tests.
Use t-critical values instead of z-critical values.
We now turn to an interval estimate for the unknown variance.
Definition: Suppose the data with and both unknown. The confidence interval for the variance is
Here is the right critical value for and is the sample variance of the data.