Parameter, Density Estimation

Lecture Note

Read class PPT for detail

Click here for Lecture PPT: TAMU

Introduction

For Bayesian Classifier, it becomes the problem of Probability Density Function(pdf) Estimation for continuous distribution of Likelihood.

How to estimate p(xwi)p(\bf{x}|w_i) with samples of x?

X 중에 w_i 에 속하는 샘플 집합 Xi를 가지고 p(x|w_i)를 추정하라

It is modeling the probability density function (pdf) of the unknown probability distribution from which the dataset has been drawn.

The methods are classifed as

Parametric model

  • Gaussian Mixture Model

  • multivariate normal distribution (MND)

Nonparametric model

  • Kernel Density Estimation, Histogram, Parzen Window

  • k-Nearest neighbor

Kernel Density Estimation Method

It has the parameter of b, the bandwidth

  • Example: Gaussian kernel (1D) for k

Find the optimum b: We look for such a value of b that minimizes the difference between the real shape of f(x) and the shape of our model f_b(x).

Example

Let {xi}N i=1 be a one-dimensional dataset (a multi-dimensional case is similar) whose examples were drawn from a distribution with an unknown pdf f with xi 2 R for all i = 1, . . . ,N.

How to measure the goodness of estimation?

A reasonable choice of measure of this difference is called the mean integrated squared error (MISE):

we square the difference between the real pdf f and our model of it f^hat_b.

Now, to find the optimal value b* for b, we minimize the cost defined as,

Limitations

  • Memory -based method, that need to store all the samples of training.

  • For additional sample, need to re-calculate the whole process.

  • Curse-of Dimension exists: Use only for low-dimension problems

Gaussian Mixture Model

Read GMM

Reference

The Hundred-Page Machine Learning Book http://themlbook.com/wiki/doku.php

머신러닝/패턴인식, 오일석

Last updated

Was this helpful?