Bayesian Updating with Continuous Priors

Ref) MIT OCW Lecture Note

Up to now, we have only done Bayesian updating when we had a finite number of hypothesis, e.g. Have disease(1) or not(0). Now we will study Bayesian updating when there is a continuous range of hypotheses.

Examples with **** continuous ranges of hypotheses

Ex1) Suppose you have a system that can succeed or fail with probability p. Then we can hypothesize that p is anywhere in the range [0, 1]. That is, we have a continuous range of hypotheses. We will often model this example with a ‘bent’ coin with unknown probability p of heads.

Ex2) We model gestational length for single births by a normal distributions. The parameters μ\mu and σ\sigma of a normal distribution can be any real number in (,)(-\infty,\infty) and (0,)(0,\infty).

We model the random process giving rise to the data by a distribution with parameters called a parameterized distribution. Every possible choice of the parameter(s) is a hypothesis.

The law of total probability

The law of total probability for continuous probability distributions is essentially the same as for discrete distributions. Prior predictive probability can be calculated as follows.

Discrete Hypothesis

P(D)=i=1np(xθi)p(θi)P(D) = \sum\limits_{i = 1}^n {p(x|{\theta _i})p({\theta _i})}

Continuous Hypothesis

p(x)=abp(xθ)f(θ)dθp(x) = \int_a^b {p(x|\theta )f(\theta )d\theta }

Bayes' theorem for continuous probability densities

  • θ\theta is a continuous parameter with pdf f(θ)f(\theta)and range [a,b][a, b].

  • xx is random discrete data.

  • Together they have likelihood p(xθ)p(x|\theta)

f(θx)dθ=p(xθ)f(θ)dθp(x)=p(xθ)f(θ)dθabp(xθ)f(θ)dθf(\theta |x)d\theta = {{p(x|\theta )f(\theta )d\theta } \over {p(x)}} = {{p(x|\theta )f(\theta )d\theta } \over {\int_a^b {p(x|\theta )f(\theta )d\theta } }}

Proof can be done using Bayes' theorem in discrete priors.

f(θx)dθ=P(HD)=P(DH)P(H)P(D)=p(xθ)f(θ)dθp(x)f(\theta |x)d\theta = P(H|D) = {{P(D|H)P(H)} \over {P(D)}} = {{p(x|\theta )f(\theta )d\theta } \over {p(x)}}

Bayesian updating with continuous priors

Ex) We have a bent coin with unknown probability θ\theta of heads. Suppose we toss it once and get tails. Assume a flat prior( θ\theta has range [0,1] then f(θ)=1f(\theta)=1 ) and find the posterior probability for θ\theta.

From discrete to continuous Bayesian updating

  1. Approximate the continuous range of hypotheses by a finite number.

  2. Create the discrete updating table for the finite number of hypotheses.

  3. Consider how the table changes as the number of hypotheses goes to infinity.

Ex) To keep things concrete, we will work with the ‘bent’ coin with a flat prior f(θ) = 1 from above example. Our goal is to go from discrete to continuous by increasing the number of hypotheses.

4 hypotheses. We slice [0, 1] into 4 equal intervals: [0, 1/4], [1/4, 1/2], [1/2, 3/4], [3/4, 1]. Each slice has width Δθ = 1/4. We put our 4 hypotheses θi at the centers of the four slices: θ1: ‘θ = 1/8’, θ2: ‘θ = 3/8’, θ3: ‘θ = 5/8’, θ4: ‘θ = 7/8’. The flat prior gives each hypothesis a probability of 1/4 = 1 · Δθ. We have the table:

Here are the density histograms of the prior and posterior pmf. The prior and posterior pdfs from above example are superimposed on the histograms in red.

8 hypotheses. Next we slice [0,1] into 8 intervals each of width Δθ = 1/8 and use the center of each slice for our 8 hypotheses. The flat prior gives each hypothesis the probability 1/8 = 1 · Δθ. Here are the table and density histograms.

20 hypotheses. Finally we slice [0,1] into 20 pieces. This is essentially identical to the previous two cases. Let’s skip right to the density histograms.

Looking at the sequence of plots we see how the prior and posterior density histograms converge to the prior and posterior probability density functions.

Last updated