Linear Discriminant Analysis

Concept

StatQuest Youtube

Introduction

LDA is a classification method that maximizes the separation between classes.

The LDA concept in StatQuest is Fisher's linear discriminant

  • The ratio of the variance between the classes to the variance within the classes:

Here, the LDA is explained with the following assumptions

  1. Normally distributted

  2. Equal class covariances

Dimension Reduction Problem

Supervised method of dimension reduction.

Extract basis (w) for data projection that

  • Maximizes separability between classes

  • while minimizing scatter within the same class

Mathematics

Find basis (w) for minimizing cost function

Classification Problem

Posterior Probability Function

For fixed x, choose the class $k$ which gives the maximum (Posterior) probability of

P(Y=kX=x)P(Y=k | X=x)

The classification can be expressed in terms of posterior probability function using Bayes rule as

As a Multivariate Gaussian Distribution

If Posterior functionpk(x)p_k(x) and Likelihood function fk(x)f_k(x) are assumed to be multivariate Gaussian Distribution, then

Here, the covariance matrix for all classes are assumed to be equal.

If the covariance is not equal, then use Quadratic Discriminant Analysis

What is covariance? Read here

Derivation

Linear Discriminant Functions

How to find the class k that gives maximum posterior probability pk(x)p_k(x) ?

Take Log on pk(x)p_k(x)

Maximizing log(pk(x))log(p_k(x)) is equivalent to maximizing δk(x)\delta_k(x)

Estimating Linear discriminant function

From Training dataset, estimate

i: ith data, j: jth dimension.

Then, Estimate Linear Discriminant Function from

Classification with LDA

Example

In MATLAB

LDA example

Assumption

  • Normally distributted

  • Equal class covariances

Example 1: 2-class classification

  • Dimension(feature number) p=2

  • class num K=2

  • Total dataset N=6

N1=3; N2=3; N=N1+N2; K=2;
% Dataset
x1=[1;3]; x2=[2;3]; x3=[2;4]; x4=[3;1]; x5=[3;2]; x6=[4;2];
% Label  class 1
y1=1; y2=1; y3=1;
% Label class 2
y4=2; y5=2; y6=2;

X=[x1 x2 x3 x4 x5 x6]
Y=[y1 y2 y3 y4 y5 y6]

X = 2×6

1 2 2 3 3 4 3 3 4 1 2 2

Y = 1×6

1 1 1 2 2 2

Estimate Linear Discriminant functions

% Prior
pi1=N1/N
pi2=N2/N

% mu
mu1=sum(X(:,1:3),2)/ N1
mu2=sum(X(:,4:6),2)/ N2

%covariance
sum_temp1=0;
for i=1:N1
  	sum_temp1=sum_temp1+(X(:,i)-mu1)*(X(:,i)-mu1)';
end
sum_temp2=sum_temp1;
for i=N1+1:N
	sum_temp2=sum_temp2+(X(:,i)-mu2)*(X(:,i)-mu2)';
end

%cov1=cov2=cov
cov=1/(N-K)*sum_temp2
icov=inv(cov)

% Delta
LD1=@(x) x'*(icov)*(mu1)-0.5*(mu1)'*(icov)*(mu1)+log(pi1)
LD2=@(x) x'*(icov)*(mu2)-0.5*(mu2)'*(icov)*(mu2)+log(pi2)

mu1 = 2×1

1.6667    3.3333

mu2 = 2×1

3.3333    1.6667

cov = 2×2

0.3333    0.1667    0.1667    0.3333

icov = 2×2

4.0000   -2.0000   -2.0000    4.0000

\

Example 2: 3-class classification

  • Dimension(feature number) p=2

  • class num K=3

  • Total dataset N=6

To set the boundary

Last updated