Linear Discriminant Analysis
Concept
Introduction
LDA is a classification method that maximizes the separation between classes.
The LDA concept in StatQuest is Fisher's linear discriminant
The ratio of the variance between the classes to the variance within the classes:
Here, the LDA is explained with the following assumptions
Normally distributted
Equal class covariances
Dimension Reduction Problem
Supervised method of dimension reduction.
Extract basis (w) for data projection that
Maximizes separability between classes
while minimizing scatter within the same class


Mathematics
Find basis (w) for minimizing cost function


Classification Problem
Posterior Probability Function
For fixed x, choose the class $k$ which gives the maximum (Posterior) probability of
The classification can be expressed in terms of posterior probability function using Bayes rule as

As a Multivariate Gaussian Distribution
If Posterior function and Likelihood function are assumed to be multivariate Gaussian Distribution, then

Here, the covariance matrix for all classes are assumed to be equal.
If the covariance is not equal, then use Quadratic Discriminant Analysis
What is covariance? Read here
Derivation


Linear Discriminant Functions
How to find the class k that gives maximum posterior probability ?
Take Log on

Maximizing is equivalent to maximizing

Estimating Linear discriminant function
From Training dataset, estimate

i: ith data, j: jth dimension.
Then, Estimate Linear Discriminant Function from

Classification with LDA

Example
In MATLAB
LDA example
Assumption
Normally distributted
Equal class covariances
Example 1: 2-class classification
Dimension(feature number) p=2
class num K=2
Total dataset N=6
N1=3; N2=3; N=N1+N2; K=2;
% Dataset
x1=[1;3]; x2=[2;3]; x3=[2;4]; x4=[3;1]; x5=[3;2]; x6=[4;2];
% Label class 1
y1=1; y2=1; y3=1;
% Label class 2
y4=2; y5=2; y6=2;
X=[x1 x2 x3 x4 x5 x6]
Y=[y1 y2 y3 y4 y5 y6]
X = 2×6
1 2 2 3 3 4 3 3 4 1 2 2
Y = 1×6
1 1 1 2 2 2
Estimate Linear Discriminant functions
% Prior
pi1=N1/N
pi2=N2/N
% mu
mu1=sum(X(:,1:3),2)/ N1
mu2=sum(X(:,4:6),2)/ N2
%covariance
sum_temp1=0;
for i=1:N1
sum_temp1=sum_temp1+(X(:,i)-mu1)*(X(:,i)-mu1)';
end
sum_temp2=sum_temp1;
for i=N1+1:N
sum_temp2=sum_temp2+(X(:,i)-mu2)*(X(:,i)-mu2)';
end
%cov1=cov2=cov
cov=1/(N-K)*sum_temp2
icov=inv(cov)
% Delta
LD1=@(x) x'*(icov)*(mu1)-0.5*(mu1)'*(icov)*(mu1)+log(pi1)
LD2=@(x) x'*(icov)*(mu2)-0.5*(mu2)'*(icov)*(mu2)+log(pi2)
mu1 = 2×1
1.6667 3.3333
mu2 = 2×1
3.3333 1.6667
cov = 2×2
0.3333 0.1667 0.1667 0.3333
icov = 2×2
4.0000 -2.0000 -2.0000 4.0000
\

Example 2: 3-class classification
Dimension(feature number) p=2
class num K=3
Total dataset N=6

To set the boundary

Last updated
Was this helpful?