Bayesian Updating with Continuous Priors
Ref) MIT OCW Lecture Note
Up to now, we have only done Bayesian updating when we had a finite number of hypothesis, e.g. Have disease(1) or not(0). Now we will study Bayesian updating when there is a continuous range of hypotheses.
Examples with **** continuous ranges of hypotheses
Ex1) Suppose you have a system that can succeed or fail with probability p. Then we can hypothesize that p is anywhere in the range [0, 1]. That is, we have a continuous range of hypotheses. We will often model this example with a ‘bent’ coin with unknown probability p of heads.
Ex2) We model gestational length for single births by a normal distributions. The parameters and of a normal distribution can be any real number in and .
We model the random process giving rise to the data by a distribution with parameters called a parameterized distribution. Every possible choice of the parameter(s) is a hypothesis.
The law of total probability
The law of total probability for continuous probability distributions is essentially the same as for discrete distributions. Prior predictive probability can be calculated as follows.
Discrete Hypothesis
Continuous Hypothesis
Bayes' theorem for continuous probability densities
is a continuous parameter with pdf and range .
is random discrete data.
Together they have likelihood
Proof can be done using Bayes' theorem in discrete priors.
Bayesian updating with continuous priors
Ex) We have a bent coin with unknown probability of heads. Suppose we toss it once and get tails. Assume a flat prior( has range [0,1] then ) and find the posterior probability for .
From discrete to continuous Bayesian updating
Approximate the continuous range of hypotheses by a finite number.
Create the discrete updating table for the finite number of hypotheses.
Consider how the table changes as the number of hypotheses goes to infinity.
Ex) To keep things concrete, we will work with the ‘bent’ coin with a flat prior f(θ) = 1 from above example. Our goal is to go from discrete to continuous by increasing the number of hypotheses.
4 hypotheses. We slice [0, 1] into 4 equal intervals: [0, 1/4], [1/4, 1/2], [1/2, 3/4], [3/4, 1]. Each slice has width Δθ = 1/4. We put our 4 hypotheses θi at the centers of the four slices: θ1: ‘θ = 1/8’, θ2: ‘θ = 3/8’, θ3: ‘θ = 5/8’, θ4: ‘θ = 7/8’. The flat prior gives each hypothesis a probability of 1/4 = 1 · Δθ. We have the table:
Here are the density histograms of the prior and posterior pmf. The prior and posterior pdfs from above example are superimposed on the histograms in red.
8 hypotheses. Next we slice [0,1] into 8 intervals each of width Δθ = 1/8 and use the center of each slice for our 8 hypotheses. The flat prior gives each hypothesis the probability 1/8 = 1 · Δθ. Here are the table and density histograms.
20 hypotheses. Finally we slice [0,1] into 20 pieces. This is essentially identical to the previous two cases. Let’s skip right to the density histograms.
Looking at the sequence of plots we see how the prior and posterior density histograms converge to the prior and posterior probability density functions.
Last updated