Kernel Density Estimation, Histogram, Parzen Window
k-Nearest neighbor
Kernel Density Estimation Method
It has the parameter of b, the bandwidth
Example: Gaussian kernel (1D) for k
Find the optimum b: We look for such a value of b that minimizes the difference between the real shape of f(x) and the shape of our model f_b(x).
Example
Let {xi}N i=1 be a one-dimensional dataset (a multi-dimensional case is similar) whose examples were drawn from a distribution with an unknown pdf f with xi 2 R for all i = 1, . . . ,N.
How to measure the goodness of estimation?
A reasonable choice of measure of this difference is called the mean integrated squared error (MISE):
we square the difference between the real pdf f and our model of it f^hat_b.
Now, to find the optimal value b* for b, we minimize the cost defined as,
Limitations
Memory -based method, that need to store all the samples of training.
For additional sample, need to re-calculate the whole process.
Curse-of Dimension exists: Use only for low-dimension problems