Comparison of Bayesian and frequentist inference
Last updated
Last updated
If the prior and the likelihood are known for all hypothesis, then Bayes' formula computes the posterior exactly. But, in most experiments, the prior probabilities on hypoteses are not known. In this case, our recourse is the art of statistical inference:
We either make up a prior (Bayesian).
Do our best using only the likelihood (frequentist).
Frequentist say that probability measures the frequency of various outcomes of an experiment.
Bayesians say that probability is an abstract concept that measures a degree of belief(certainty) in a given position. In practice Bayesians do not assign a single value for the probability of a coin coming up heads. Rather they consider a range of values each with its own probability of being true.
uses probabilities for both hypotheses and data.
depends on the prior and likelihood of observed data.
requires one to know or construct a 'subjective prior'.
dominated statistical practice before the 20th century.
may be computationally intensive due to integration over many parameters.
Logically impeccable.
Probabilities can be interpreted.
never uses or gives the probability of a hypothesis (no prior or posterior).
depends on the likelihood for both observed and unobserved data.
does not require a prior.
dominated statistical practice during the 20th century.
tends to be less computationally inetensive.
Objective - everyone gets the same answer
Logically complex
Requires complete description of experimental protocol and data analysis protocal before starting the experiment. (This is both good and bad)
The main critique of Bayesian inference is that a subjective prior is, well, subjective. There is no single method for choosing a prior, so different people will produce different priors and may therefore arrive at different posteriors and conclusions.
There are philosophical objections to assigning probabilities to hypotheses, as hypotheses do not constitute outcomes of repeatable experiments in which one can measure long-term frequency. Rather, a hypothesis is either true or false, regardless of whetehr one knows which is the case. A coins is either fair or unfair; treatment 1 is either better or worse than treatment 2; the sun will or will not come up tomorrow.
The probabilities of hypotheses is exactly what we need to make decisions.
Using Bayes' theorem is loically rigorous.
By trying different priors we can see how sensitive our results are to the choice of prior.
It is easy to communicate a result framed in terms of probabilities of hypotheses.
Even though the prior may be subjective, one can specify the assumptions used to arrive at it, which allows other people to challenge it or try other priors.
The evidence derived from the data is independent of notions about 'data more extreme' that depend on the exact experimental setup.
Data can be used as it comes in. There is no requirement that every contingency be planned for ahead of time.
It is ad-hoc and does not carry the force of deductive logic. Notions like 'data more extreme' are not well defined. The p-value depends on the exact experimental setup.
Experiments must be fully specified ahead of time.
The p-value and significance level are notoriously prone to misinterpretation. A significance level of 0.05 means the probability of a type Ⅰ erros is 5%. That is, if the null hypothesis is true then 5% of the time it will be rejected due to randomness. Many (most) other people erroneously think a p-value of 0.05 means that the probability of the null hypothesis is 5%.
It is objective: all statisticians will agree on the p-value.
Hypothesis testing using frequentist significance testing is applied in the statistical analysis of scientific investigations, evaluating the strength of evidence against a null hypothesis with data. The interpretation of the result is left to the user of the tests.
Frequentist experimental design demands a careful description of the experiment and methods of analysis before starting. This helps control for experimenter bias.
The frequentist approach has been used over 100 years and we have seen tremendous scientific progress.
When running a seris of trials we need a rule on when to stop. In this example we'll consider two coin tossing experiments.
Exp1: Toss the coin exactly 6 times and report the number of heads.
Exp2: Toss the coin until the first tails and report the number of heads.
Jon is worried that his coin is biased toward heads, so before using it in class he tests it for fairness. He runs an experiment and reports to Jerry that his sequence of tosses was HHHHHT. But Jerry is only half-listening, and he forgets which experiment Jon ran to produce the data.
Sine he's forgotten which experiment Jon ran, Jerry the frequentist decides to compute the p-values for both experiments given Jon's data.
Let be the probability of heads. We have the null and one-sided alternative hypotheses
Exp1: The null distribution is binomial(6, 0.5) so, the one sided p-value is the probability of 5 or 6 heads in 6 tosses.
Exp2: The null distribution is geometric(0.5) so, the one sided p-value is the probability of 5 or more heads before the first tails.
Using the typical significance level of 0.05, we would reject in experiment 2, but not in experiment 1.
The frequentis is fine with this. The set of possible outcomes is different for the different experiments so the notion of extreme data, and therfore p-value, is different. For example, in experiment 1 we would consider THHHHH to be as extreme as HHHHHT. In experiment 2, we would never see THHHHH since the experiment would end after the first tails.
Jerry the Bayesian knows it doesn't matter which of the two experimetnts Jon ran, since the binomal and geometric likelihood functions for the data HHHHHT are proportional.
In either case, he must make up a prior, and he choose Beta(3,3). Data of 5 heads and and 1 tails gives a posterior distribution Beta(8,4). Here is a graph of the prior and the posterior. The blue lines at the bottom are 50% and 90% probability intervals for the posterior.
Here are the relevant computations in MATLAB:
Starting from the prior Beta(3,3), the posterior probability that the coin is biased toward heads is 0.89.