Bayesian Classifier
Review on Probability
See Terminology Review before reading further
Concept
Assume we are to classify an object based on the evidence provided by feature vector x, as class w1 or class w2.


Bayesian Classification
Decision Rule (binary class problem)
Let w1: class 1, w2: class 2.
If P(w1|X)> P(w2|X), then X belongs to w1. Else w2
Applying Bayesian rule, it becomes minimum error Bayesian Classifier
Minimum Error Bayesian Classifier
If p(X|w1)P(w1)> p(X|w2)P(w2), then X belongs to w1. Else w2
Since X can be either discrete,continuous, use small p for p(X|w1). For discrete lable w, use P(w), P(w|x)
Likelihood Ratio Test (LRT)
Since p(x) does not affect decision rule, rearrange using the term .

Probability of Error
For binary classification,
P(e)=P(e|w1)P(w1)+P(e|w2)P(w2)
If P(w1)=P(w2)=0.5, then P(e)=0.5(e1+e2)

How good is the LRT decision rule?
Optimal decision rule will minimize P(e|x) at every value of x so the integral is minimized
P(e)=Integral_INF { P(e|x)p(x)dx}
For any given problem, the minimum probability error is achieved by LRT decision. The best classifier
Example: LRT

Bayes Risk
Penalty of misclassifying can have different weight for each class.
For example, misclassifying a cancer sufferer as a healthy patient is a much more serious problem than the other way around
Minimum risk Bayesian Classifier
Let C_ij is the cost of choosing class w_i when w_j is the true class.
e.g. C21 is wrong classification as w2 when the true class is w1.
Bayes Risk R
Expected value of the cost:
R= E[C] = { c11 p(x|w1)P(w1)+ c12p(x|w2)P(w2)} + c21 p(x|w1)P(w1)+ c22p(x|w2)P(w2)}
After some rearrangement, it becomes a form of Likelihood Ratio


Example

Last updated
Was this helpful?