Parameter, Density Estimation
Lecture Note
Click here for Lecture PPT: TAMU
Introduction
For Bayesian Classifier, it becomes the problem of Probability Density Function(pdf) Estimation for continuous distribution of Likelihood.

How to estimate with samples of x?
X 중에 w_i 에 속하는 샘플 집합 Xi를 가지고 p(x|w_i)를 추정하라

It is modeling the probability density function (pdf) of the unknown probability distribution from which the dataset has been drawn.
The methods are classifed as
Parametric model
Gaussian Mixture Model
multivariate normal distribution (MND)
Nonparametric model
Kernel Density Estimation, Histogram, Parzen Window
k-Nearest neighbor
Kernel Density Estimation Method
It has the parameter of b, the bandwidth

Example: Gaussian kernel (1D) for k

Find the optimum b: We look for such a value of b that minimizes the difference between the real shape of f(x) and the shape of our model f_b(x).
Example
Let {xi}N i=1 be a one-dimensional dataset (a multi-dimensional case is similar) whose examples were drawn from a distribution with an unknown pdf f with xi 2 R for all i = 1, . . . ,N.

How to measure the goodness of estimation?
A reasonable choice of measure of this difference is called the mean integrated squared error (MISE):
we square the difference between the real pdf f and our model of it f^hat_b.


Now, to find the optimal value b* for b, we minimize the cost defined as,

Limitations
Memory -based method, that need to store all the samples of training.
For additional sample, need to re-calculate the whole process.
Curse-of Dimension exists: Use only for low-dimension problems
Gaussian Mixture Model
Reference
The Hundred-Page Machine Learning Book http://themlbook.com/wiki/doku.php
머신러닝/패턴인식, 오일석
Last updated
Was this helpful?