🖍️
gitbook_docs
  • Introduction
  • Machine Learning
    • Recommended Courses
      • For Undergrad Research
      • Math for Machine Learning
    • ML Notes
      • Covariance Correlation
      • Feature Selection
      • Linear Regression
      • Entropy, Cross-Entropy, KL Divergence
      • Bayesian Classifier
        • Terminology Review
        • Bayesian Classifier for Normally Distributed classes
      • Linear Discriminant Analysis
      • Logistic Regression
        • Logistic Regression Math
      • Logistic Regression-MaximumLikelihood
      • SVM
        • SVM concept
        • SVM math
      • Cross Validation
      • Parameter, Density Estimation
        • MAP, MLE
        • Gaussian Mixture Model
      • E-M
      • Density Estimation(non-parametric)
      • Unsupervised Learning
      • Clustering
      • kNN
      • WaveletTransform
      • Decision Tree
    • Probability and Statistics for Machine Learning
      • Introduction
      • Basics of Data Analysis
      • Probability for Discrete Random Variable
      • Poisson Distribution
      • Chi-Square Distribution
      • P-value and Statistical Hypothesis
      • Power and Sample Size
      • Hypothesis Test Old
      • Hypothesis Test
      • Multi Armed Bandit
      • Bayesian Inference
      • Bayesian Updating with Continuous Priors
      • Discrete Distribution
      • Comparison of Bayesian and frequentist inference
      • Confidence Intervals for Normal Data
      • Frequenist Methods
      • Null Hypothesis Significance Testing
      • Confidence Intervals: Three Views
      • Confidence Intervals for the Mean of Non-normal Data
      • Probabilistic Prediction
  • Industrial AI
    • PHM Dataset
    • BearingFault_Journal
      • Support Vector Machine based
      • Autoregressive(AR) model based
      • Envelope Extraction based
      • Wavelet Decomposition based
      • Prediction of RUL with Deep Convolution Nueral Network
      • Prediction of RUL with Information Entropy
      • Feature Model and Feature Selection
    • TempCore Journal
      • Machine learning of mechanical properties of steels
      • Online prediction of mechanical properties of hot rolled steel plate using machine learning
      • Prediction and Analysis of Tensile Properties of Austenitic Stainless Steel Using Artificial Neural
      • Tempcore, new process for the production of high quality reinforcing
      • TEMPCORE, the most convenient process to produce low cost high strength rebars from 8 to 75 mm
      • Experimental investigation and simulation of structure and tensile properties of Tempcore treated re
    • Notes
  • LiDAR
    • Processing of Point Cloud
    • Intro. 3D Object Detection
    • PointNet
    • PointNet++
    • Frustrum-PointNet
    • VoxelNet
    • Point RCNN
    • PointPillars
    • LaserNet
  • Simulator
    • Simulator List
    • CARLA
    • Airsim
      • Setup
      • Tutorial
        • T#1
        • T#2
        • T#3: Opencv CPP
        • T#4: Opencv Py
        • Untitled
        • T#5: End2End Driving
  • Resources
    • Useful Resources
    • Github
    • Jekyll
  • Reinforcement Learning
    • RL Overview
      • RL Bootcamp
      • MIT Deep RL
    • Textbook
    • Basics
    • Continuous Space RL
  • Unsupervised Learning
    • Introduction
  • Unclassified
    • Ethics
    • Conference Guideline
  • FPGA
    • Untitled
  • Numerical Method
    • NM API reference
Powered by GitBook
On this page
  • Introduction
  • Interval Statistics
  • Confidence Interval for the Mean
  • Definition of confidence intervals for the mean
  • Rejection regions
  • Manipulating intervals: pivoting
  • -confidence Interval for the mean
  • Chi-square Confidence Interval for the Variance

Was this helpful?

  1. Machine Learning
  2. Probability and Statistics for Machine Learning

Confidence Intervals for Normal Data

PreviousComparison of Bayesian and frequentist inferenceNextFrequenist Methods

Last updated 3 years ago

Was this helpful?

Introduction

On its own, a point estimate like xˉ=2.2\bar{x}=2.2xˉ=2.2 carries no information about its accuracy; it's just a single number, regardless of wheter its based on ten data points or one million data points.

For this reason, statisticians augment point estimates with confidence intervals. For example, to estimate unknown mean μ\muμ we might be able to say that our best estimate of the mean is xˉ=2.2\bar{x}=2.2xˉ=2.2 with a 95%95\%95% confidence interval [1.2,3.2][1.2, 3.2][1.2,3.2].

Interval Statistics

Recall that our working definition of a statistic is anything that be computed from data. In particular, the formula for a statistic cannot include unknown quantities.

Technically an interval statistic is nothing more than a pair of point statistics giving the lower and upper bounds of the interval. Our reason for emphasizing that the interval is a statitic is to highlight the folliwng:

  1. The interval is random - new random data will produce a new interval.

  2. As frequentists we are perfectly happy uisng it because it doesn't depend on the value of an unknown parameter or hypothesis.

  3. Be careful in you thinking about these probabilities. Confidence intervals are a frequentist notion. Since frequentists do not compute probabilities of hypotheses, the confidence level is never a probability that the unknown parameter is in the confidence level.

Example) 사과농장의 사과 수확량 Q1) 사과 농장에서 한 해 20만개의 사과를 수확하였다. 올해 수확한 사과 20만개의 평균무게를 알고 싶어서 20만개의 사과 중 36개를 무작위로 뽑아서 무게를 쟀다. Sample 로 뽑은 36개 사과의 평균무게는 112g이고 표준편차는 40g이다. (사과의 무게는 정규분포를 따른다고 가정함) 사과무게 모평균의 95% 신뢰구간을 구하면?

Q2) 사과를 무작위로 36개를 다시 뽑을 경우 첫 번째 36개와 같은 샘플평균이 나오게 될까?

NO!!!. 즉, 추출된 표본이 정해진 개념이 아니듯 신뢰구간 또한 명확히 정해지는 개념이 아님.

따라서, 신뢰구간 95%의 정확한 의미는 '같은 방법으로 100번 표본을 추출했을 때, 함계 계산되는 100개의 신뢰구간 중 모평균을 포함한 신뢰구간들의 숫자가 95개 정도 된다.'는 의미이다.

zzz Confidence Interval for the Mean

Throughout this page, we will assume that we have normally distributed data:

x1, x2,..., xn∼N(μ,σ2)x_1,\ x_2, . . .,\ x_n \sim N(\mu, \sigma^2)x1​, x2​,..., xn​∼N(μ,σ2)

Definition of zzz confidence intervals for the mean

Suppose the data behaves normal distribution, with unknown mean μ\muμ and known variance σ2\sigma^2σ2 . The (1−α)(1-\alpha)(1−α) confidence interval for μ\muμ is

For example, if α=0.05\alpha=0.05α=0.05 then zα/2=1.96z_{\alpha/2}=1.96zα/2​=1.96 so the 0.95(or 95%) confidence interval is

Example) Suppose we collect 100 data points from N(μ,32)N(\mu, 3^2)N(μ,32) distribution and the sample mean is xˉ=12\bar{x}=12xˉ=12 . Give the 95% confidence interval for μ\muμ .

Rejection regions

Example) Suppose that n=12n=12n=12 data points are drawn from N(μ,52)N(\mu, 5^2)N(μ,52) where μ\muμ is unknown. Set up a two-sided significance test of H0:μ=2.71H_0 :\mu=2.71H0​:μ=2.71 using the statistic xˉ\bar{x}xˉ at significance level α=0.05\alpha=0.05α=0.05. Describe the rejection and non-rejection regions.

Manipulating intervals: pivoting

Here is a quick summary of intervals xˉ\bar{x}xˉ and μ0\mu_0μ0​ and what is called pivoting. Pivioting is the idea the xˉ\bar{x}xˉ is in μ0±a\mu_0 \pm aμ0​±a says exactly the same thing as μ0\mu_0μ0​ is in xˉ±a\bar {x} \pm axˉ±a .

We make a few observations about this confidence interval.

  1. It only depnds on xˉ\bar {x}xˉ , so it is a statistic.

  2. The significance level α=0.05\alpha=0.05α=0.05 means that, assuming the null hypothesis that μ=2.71\mu=2.71μ=2.71 is ture, random data will lead us to reject the null hypothesis 5% of the time (a Type 1 error).

  3. Again assuming that μ=2.71\mu=2.71μ=2.71 , then 5% of the time the confidence interval will not contain 2.71, and conversely, 95% of the time it will contain 2.71.

Example) Suppose the data 2.5, 5.5, 8.5, 11.5 was drawn from a N(μ,102)N(\mu, 10^2)N(μ,102) distribution with unkown mean μ\muμ .

(a) Compute the point estimate xˉ\bar{x}xˉ for μ\muμ and the corresponding 50%, 80%, 95% confidence intervals. (b) Consider the null hypothesis μ=1\mu=1μ=1 . Would you reject H0H_0H0​ at α=0.05\alpha=0.05α=0.05? α=0.20\alpha=0.20α=0.20?α=0.50\alpha=0.50α=0.50 ? Do these two ways: first by checking if the hypothesized value of μ\muμ is in the relevant confidence intervals and second by constructing a rejection region.

ttt-confidence Interval for the mean

This will nearly identical to normal confidence intervals. In this setting σ\sigmaσ is not known, so we have to make the following replacements.

  1. Use sxˉ=sns_{\bar x}= {s \over \sqrt n}sxˉ​=n​s​ instead of σxˉ=σn\sigma_{\bar x}= {\sigma \over \sqrt n}σxˉ​=n​σ​ . Here sss is the sample variance we used before in t-tests.

  2. Use t-critical values instead of z-critical values.

Chi-square Confidence Interval for the Variance

We now turn to an interval estimate for the unknown variance.

Definition: Suppose the data x1, x2,..., xn∼N(μ,σ2)x_1,\ x_2, . . .,\ x_n \sim N(\mu, \sigma^2)x1​, x2​,..., xn​∼N(μ,σ2) with μ\muμ and σ\sigmaσ both unknown. The (1−α)(1-\alpha)(1−α) confidence interval for the variance σ2\sigma ^2σ2 is

Here cα/2c_{\alpha/2}cα/2​ is the right critical value P(X2>cα/2)=α/2P(X^2>c_{\alpha/2})=\alpha/2P(X2>cα/2​)=α/2 for X2∼χ2(n−1)X^2 \sim \chi^2(n-1)X2∼χ2(n−1) and s2s^2s2 is the sample variance of the data.

Ref) ,

Ref)

Critical Value
Student-t critical values
Chi-square confidence interval