🖍️
gitbook_docs
  • Introduction
  • Machine Learning
    • Recommended Courses
      • For Undergrad Research
      • Math for Machine Learning
    • ML Notes
      • Covariance Correlation
      • Feature Selection
      • Linear Regression
      • Entropy, Cross-Entropy, KL Divergence
      • Bayesian Classifier
        • Terminology Review
        • Bayesian Classifier for Normally Distributed classes
      • Linear Discriminant Analysis
      • Logistic Regression
        • Logistic Regression Math
      • Logistic Regression-MaximumLikelihood
      • SVM
        • SVM concept
        • SVM math
      • Cross Validation
      • Parameter, Density Estimation
        • MAP, MLE
        • Gaussian Mixture Model
      • E-M
      • Density Estimation(non-parametric)
      • Unsupervised Learning
      • Clustering
      • kNN
      • WaveletTransform
      • Decision Tree
    • Probability and Statistics for Machine Learning
      • Introduction
      • Basics of Data Analysis
      • Probability for Discrete Random Variable
      • Poisson Distribution
      • Chi-Square Distribution
      • P-value and Statistical Hypothesis
      • Power and Sample Size
      • Hypothesis Test Old
      • Hypothesis Test
      • Multi Armed Bandit
      • Bayesian Inference
      • Bayesian Updating with Continuous Priors
      • Discrete Distribution
      • Comparison of Bayesian and frequentist inference
      • Confidence Intervals for Normal Data
      • Frequenist Methods
      • Null Hypothesis Significance Testing
      • Confidence Intervals: Three Views
      • Confidence Intervals for the Mean of Non-normal Data
      • Probabilistic Prediction
  • Industrial AI
    • PHM Dataset
    • BearingFault_Journal
      • Support Vector Machine based
      • Autoregressive(AR) model based
      • Envelope Extraction based
      • Wavelet Decomposition based
      • Prediction of RUL with Deep Convolution Nueral Network
      • Prediction of RUL with Information Entropy
      • Feature Model and Feature Selection
    • TempCore Journal
      • Machine learning of mechanical properties of steels
      • Online prediction of mechanical properties of hot rolled steel plate using machine learning
      • Prediction and Analysis of Tensile Properties of Austenitic Stainless Steel Using Artificial Neural
      • Tempcore, new process for the production of high quality reinforcing
      • TEMPCORE, the most convenient process to produce low cost high strength rebars from 8 to 75 mm
      • Experimental investigation and simulation of structure and tensile properties of Tempcore treated re
    • Notes
  • LiDAR
    • Processing of Point Cloud
    • Intro. 3D Object Detection
    • PointNet
    • PointNet++
    • Frustrum-PointNet
    • VoxelNet
    • Point RCNN
    • PointPillars
    • LaserNet
  • Simulator
    • Simulator List
    • CARLA
    • Airsim
      • Setup
      • Tutorial
        • T#1
        • T#2
        • T#3: Opencv CPP
        • T#4: Opencv Py
        • Untitled
        • T#5: End2End Driving
  • Resources
    • Useful Resources
    • Github
    • Jekyll
  • Reinforcement Learning
    • RL Overview
      • RL Bootcamp
      • MIT Deep RL
    • Textbook
    • Basics
    • Continuous Space RL
  • Unsupervised Learning
    • Introduction
  • Unclassified
    • Ethics
    • Conference Guideline
  • FPGA
    • Untitled
  • Numerical Method
    • NM API reference
Powered by GitBook
On this page
  • Examples with **** continuous ranges of hypotheses
  • The law of total probability
  • Bayes' theorem for continuous probability densities
  • Bayesian updating with continuous priors
  • From discrete to continuous Bayesian updating

Was this helpful?

  1. Machine Learning
  2. Probability and Statistics for Machine Learning

Bayesian Updating with Continuous Priors

PreviousBayesian InferenceNextDiscrete Distribution

Last updated 3 years ago

Was this helpful?

Ref)

Up to now, we have only done Bayesian updating when we had a finite number of hypothesis, e.g. Have disease(1) or not(0). Now we will study Bayesian updating when there is a continuous range of hypotheses.

Examples with **** continuous ranges of hypotheses

Ex1) Suppose you have a system that can succeed or fail with probability p. Then we can hypothesize that p is anywhere in the range [0, 1]. That is, we have a continuous range of hypotheses. We will often model this example with a ‘bent’ coin with unknown probability p of heads.

Ex2) We model gestational length for single births by a normal distributions. The parameters μ\muμ and σ\sigmaσ of a normal distribution can be any real number in (−∞,∞)(-\infty,\infty)(−∞,∞) and (0,∞)(0,\infty)(0,∞).

We model the random process giving rise to the data by a distribution with parameters called a parameterized distribution. Every possible choice of the parameter(s) is a hypothesis.

The law of total probability

The law of total probability for continuous probability distributions is essentially the same as for discrete distributions. Prior predictive probability can be calculated as follows.

Discrete Hypothesis

P(D)=∑i=1np(x∣θi)p(θi)P(D) = \sum\limits_{i = 1}^n {p(x|{\theta _i})p({\theta _i})}P(D)=i=1∑n​p(x∣θi​)p(θi​)

Continuous Hypothesis

p(x)=∫abp(x∣θ)f(θ)dθp(x) = \int_a^b {p(x|\theta )f(\theta )d\theta }p(x)=∫ab​p(x∣θ)f(θ)dθ

Bayes' theorem for continuous probability densities

Proof can be done using Bayes' theorem in discrete priors.

Bayesian updating with continuous priors

From discrete to continuous Bayesian updating

  1. Approximate the continuous range of hypotheses by a finite number.

  2. Create the discrete updating table for the finite number of hypotheses.

  3. Consider how the table changes as the number of hypotheses goes to infinity.

Ex) To keep things concrete, we will work with the ‘bent’ coin with a flat prior f(θ) = 1 from above example. Our goal is to go from discrete to continuous by increasing the number of hypotheses.

4 hypotheses. We slice [0, 1] into 4 equal intervals: [0, 1/4], [1/4, 1/2], [1/2, 3/4], [3/4, 1]. Each slice has width Δθ = 1/4. We put our 4 hypotheses θi at the centers of the four slices: θ1: ‘θ = 1/8’, θ2: ‘θ = 3/8’, θ3: ‘θ = 5/8’, θ4: ‘θ = 7/8’. The flat prior gives each hypothesis a probability of 1/4 = 1 · Δθ. We have the table:

Here are the density histograms of the prior and posterior pmf. The prior and posterior pdfs from above example are superimposed on the histograms in red.

8 hypotheses. Next we slice [0,1] into 8 intervals each of width Δθ = 1/8 and use the center of each slice for our 8 hypotheses. The flat prior gives each hypothesis the probability 1/8 = 1 · Δθ. Here are the table and density histograms.

20 hypotheses. Finally we slice [0,1] into 20 pieces. This is essentially identical to the previous two cases. Let’s skip right to the density histograms.

Looking at the sequence of plots we see how the prior and posterior density histograms converge to the prior and posterior probability density functions.

θ\thetaθ is a continuous parameter with pdf f(θ)f(\theta)f(θ)and range [a,b][a, b][a,b].

xxx is random discrete data.

Together they have likelihood p(x∣θ)p(x|\theta)p(x∣θ)

f(θ∣x)dθ=p(x∣θ)f(θ)dθp(x)=p(x∣θ)f(θ)dθ∫abp(x∣θ)f(θ)dθf(\theta |x)d\theta = {{p(x|\theta )f(\theta )d\theta } \over {p(x)}} = {{p(x|\theta )f(\theta )d\theta } \over {\int_a^b {p(x|\theta )f(\theta )d\theta } }}f(θ∣x)dθ=p(x)p(x∣θ)f(θ)dθ​=∫ab​p(x∣θ)f(θ)dθp(x∣θ)f(θ)dθ​
f(θ∣x)dθ=P(H∣D)=P(D∣H)P(H)P(D)=p(x∣θ)f(θ)dθp(x)f(\theta |x)d\theta = P(H|D) = {{P(D|H)P(H)} \over {P(D)}} = {{p(x|\theta )f(\theta )d\theta } \over {p(x)}}f(θ∣x)dθ=P(H∣D)=P(D)P(D∣H)P(H)​=p(x)p(x∣θ)f(θ)dθ​

Ex) We have a bent coin with unknown probability θ\thetaθ of heads. Suppose we toss it once and get tails. Assume a flat prior( θ\thetaθ has range [0,1] then f(θ)=1f(\theta)=1f(θ)=1 ) and find the posterior probability for θ\thetaθ.

MIT OCW Lecture Note