Linear Discriminant Analysis
Concept
Introduction
LDA is a classification method that maximizes the separation between classes.
The LDA concept in StatQuest is Fisher's linear discriminant
The ratio of the variance between the classes to the variance within the classes:
Here, the LDA is explained with the following assumptions
Normally distributted
Equal class covariances
Dimension Reduction Problem
Supervised method of dimension reduction.
Extract basis (w) for data projection that
Maximizes separability between classes
while minimizing scatter within the same class
Mathematics
Find basis (w) for minimizing cost function
Classification Problem
Posterior Probability Function
For fixed x, choose the class $k$ which gives the maximum (Posterior) probability of
The classification can be expressed in terms of posterior probability function using Bayes rule as
As a Multivariate Gaussian Distribution
If Posterior function and Likelihood function are assumed to be multivariate Gaussian Distribution, then
Here, the covariance matrix for all classes are assumed to be equal.
If the covariance is not equal, then use Quadratic Discriminant Analysis
What is covariance? Read here
Derivation
Linear Discriminant Functions
How to find the class k that gives maximum posterior probability ?
Take Log on
Maximizing is equivalent to maximizing
Estimating Linear discriminant function
From Training dataset, estimate
i: ith data, j: jth dimension.
Then, Estimate Linear Discriminant Function from
Classification with LDA
Example
In MATLAB
LDA example
Assumption
Normally distributted
Equal class covariances
Example 1: 2-class classification
Dimension(feature number) p=2
class num K=2
Total dataset N=6
X = 2×6
1 2 2 3 3 4 3 3 4 1 2 2
Y = 1×6
1 1 1 2 2 2
Estimate Linear Discriminant functions
mu1 = 2×1
mu2 = 2×1
cov = 2×2
icov = 2×2
\
Example 2: 3-class classification
Dimension(feature number) p=2
class num K=3
Total dataset N=6
To set the boundary
Last updated