Linear Discriminant Analysis

Concept

Introduction

LDA is a classification method that maximizes the separation between classes.

The LDA concept in StatQuest is Fisher's linear discriminant

The ratio of the variance between the classes to the variance within the classes:
$S={\frac {\sigma _{\text{between}}^{2}}{\sigma _{\text{within}}^{2}}}={\frac {({\vec {w}}\cdot {\vec {\mu }}_{1}-{\vec {w}}\cdot {\vec {\mu }}_{0})^{2}}{{\vec {w}}^{T}\Sigma _{1}{\vec {w}}+{\vec {w}}^{T}\Sigma _{0}{\vec {w}}}}={\frac {({\vec {w}}\cdot ({\vec {\mu }}_{1}-{\vec {\mu }}_{0}))^{2}}{{\vec {w}}^{T}(\Sigma _{0}+\Sigma _{1}){\vec {w}}}}$

Here, the LDA is explained with the following assumptions

Normally distributted
Equal class covariances

Dimension Reduction Problem

Supervised method of dimension reduction.

Extract basis (w) for data projection that

Maximizes separability between classes
while minimizing scatter within the same class

Mathematics

Find basis (w) for minimizing cost function

Classification Problem

Posterior Probability Function

For fixed x, choose the class $k$ which gives the maximum (Posterior) probability of

P(Y=k | X=x)

The classification can be expressed in terms of posterior probability function using Bayes rule as

As a Multivariate Gaussian Distribution

If Posterior function $p_k(x)$ and Likelihood function $f_k(x)$ are assumed to be multivariate Gaussian Distribution, then

Here, the covariance matrix for all classes are assumed to be equal.

If the covariance is not equal, then use Quadratic Discriminant Analysis

What is covariance? Read here

Derivation

Linear Discriminant Functions

How to find the class k that gives maximum posterior probability $p_k(x)$ ?

Take Log on $p_k(x)$

Maximizing $log(p_k(x))$ is equivalent to maximizing $\delta_k(x)$

Estimating Linear discriminant function

From Training dataset, estimate

i: ith data, j: jth dimension.

Then, Estimate Linear Discriminant Function from

Classification with LDA

Example

In MATLAB

LDA example

Assumption

Normally distributted
Equal class covariances

Example 1: 2-class classification

Dimension(feature number) p=2
class num K=2
Total dataset N=6

N1=3; N2=3; N=N1+N2; K=2;
% Dataset
x1=[1;3]; x2=[2;3]; x3=[2;4]; x4=[3;1]; x5=[3;2]; x6=[4;2];
% Label  class 1
y1=1; y2=1; y3=1;
% Label class 2
y4=2; y5=2; y6=2;

X=[x1 x2 x3 x4 x5 x6]
Y=[y1 y2 y3 y4 y5 y6]

X = 2×6

1 2 2 3 3 4 3 3 4 1 2 2

Y = 1×6

1 1 1 2 2 2

Estimate Linear Discriminant functions

% Prior
pi1=N1/N
pi2=N2/N

% mu
mu1=sum(X(:,1:3),2)/ N1
mu2=sum(X(:,4:6),2)/ N2

%covariance
sum_temp1=0;
for i=1:N1
  	sum_temp1=sum_temp1+(X(:,i)-mu1)*(X(:,i)-mu1)';
end
sum_temp2=sum_temp1;
for i=N1+1:N
	sum_temp2=sum_temp2+(X(:,i)-mu2)*(X(:,i)-mu2)';
end

%cov1=cov2=cov
cov=1/(N-K)*sum_temp2
icov=inv(cov)

% Delta
LD1=@(x) x'*(icov)*(mu1)-0.5*(mu1)'*(icov)*(mu1)+log(pi1)
LD2=@(x) x'*(icov)*(mu2)-0.5*(mu2)'*(icov)*(mu2)+log(pi2)