Bayesian Updating with Continuous Priors
Last updated
Was this helpful?
Last updated
Was this helpful?
Ref) MIT OCW Lecture Note
Up to now, we have only done Bayesian updating when we had a finite number of hypothesis, e.g. Have disease(1) or not(0). Now we will study Bayesian updating when there is a continuous range of hypotheses.
Ex1) Suppose you have a system that can succeed or fail with probability p. Then we can hypothesize that p is anywhere in the range [0, 1]. That is, we have a continuous range of hypotheses. We will often model this example with a ‘bent’ coin with unknown probability p of heads.
Ex2) We model gestational length for single births by a normal distributions. The parameters and of a normal distribution can be any real number in and .
The law of total probability for continuous probability distributions is essentially the same as for discrete distributions. Prior predictive probability can be calculated as follows.
Continuous Hypothesis
Proof can be done using Bayes' theorem in discrete priors.
Approximate the continuous range of hypotheses by a finite number.
Create the discrete updating table for the finite number of hypotheses.
Consider how the table changes as the number of hypotheses goes to infinity.
Ex) To keep things concrete, we will work with the ‘bent’ coin with a flat prior f(θ) = 1 from above example. Our goal is to go from discrete to continuous by increasing the number of hypotheses.
4 hypotheses. We slice [0, 1] into 4 equal intervals: [0, 1/4], [1/4, 1/2], [1/2, 3/4], [3/4, 1]. Each slice has width Δθ = 1/4. We put our 4 hypotheses θi at the centers of the four slices: θ1: ‘θ = 1/8’, θ2: ‘θ = 3/8’, θ3: ‘θ = 5/8’, θ4: ‘θ = 7/8’. The flat prior gives each hypothesis a probability of 1/4 = 1 · Δθ. We have the table:
Here are the density histograms of the prior and posterior pmf. The prior and posterior pdfs from above example are superimposed on the histograms in red.
8 hypotheses. Next we slice [0,1] into 8 intervals each of width Δθ = 1/8 and use the center of each slice for our 8 hypotheses. The flat prior gives each hypothesis the probability 1/8 = 1 · Δθ. Here are the table and density histograms.
20 hypotheses. Finally we slice [0,1] into 20 pieces. This is essentially identical to the previous two cases. Let’s skip right to the density histograms.
Looking at the sequence of plots we see how the prior and posterior density histograms converge to the prior and posterior probability density functions.
is a continuous parameter with pdf and range .
is random discrete data.
Together they have likelihood
Ex) We have a bent coin with unknown probability of heads. Suppose we toss it once and get tails. Assume a flat prior( has range [0,1] then ) and find the posterior probability for .