Logistic Regression-MaximumLikelihood

Logistic Regression - Maximum Likelihood

Concept

https://youtu.be/vN5cNN2-HWE

https://youtu.be/BfKanl1aSG0

Introduction

Click here for the full note

Logistic regression is a classification problem

It is choosing class 'k' of highest Pr(Y=kX=x)Pr(Y=k | X=x)

Q. How to estimate Pr(Y=kX=x)Pr(Y=k | X=x) ?

  • Option 1: Indirectly estimate using Bayes rule. e.g. LDA

  • Option 2: Directly estimate using Logistic Regression

Logistic Regression Model of Posterior Probability Function

From now on, we will assume it is binary classification, K=[0,1], number of K=2

Let function p(x)p(x) is

p(x)=Pr(Y=1X=x)p(x)=Pr(Y=1|X=x)

Definition of odds are

  • Odds: p(x)/(1p(x))p(x)/(1-p(x))

  • log-odds: log[p(x)/(1p(x))]log[p(x)/(1-p(x))]

Let Log-odds is a Linear Function of data X

image-20220120120416955

Q) How to estimate the parameter βi\beta_i ? Use maximum likelihood

Estimate Posterior Probability Function with Likelihood Function

Given: Training dataset: (z1,y1) (zN,yN)(z_1,y_1)~(z_N,y_N)

The probability of the observed data: product of probability that Y=1 for z_i of k=1 and probability that Y=0 for z_i of k=0

l(β)=i:yi=1Pr(Y=1X=zi)i:yi=0Pr(Y=0X=zi)l(\beta ) = \prod\limits_{i:yi = 1} {\Pr (Y = 1|X = {z_i})} \prod\limits_{i:yi = 0} {\Pr (Y = 0|X = {z_i})}

This is also known as Likelihood function.

Let function p(zi)p(z_i) is

p(zi)=Pr(Y=1X=zi)p(z_i)=Pr(Y=1|X=z_i)

Then, Likelihood function is expressed as

l(β)=i:yi=1p(zi)i:yi=0(1p(zi))l(\beta ) = \prod\limits_{i:yi = 1} {p({z_i})} \prod\limits_{i:yi = 0} {(1-p({z_i}))}
image-20220120121204377

Maximizing Likelihood

Goal: estimate parameter β\beta that maximizes likelihood function.

Maximizing likelihood is also maximizing log-likelihood

L(β)=log(l(β))L(\beta)=log(l(\beta))
image-20220120122406662

Estimating parameter β\beta for Maximizing Log-Likelihood

βt+1=βtH1L(βt){{\bf{\beta }}_{t + 1}} = {{\bf{\beta }}_t} - {{\bf{H}}^{ - 1}}\nabla L({{\bf{\beta }}_t})
image-20220120123452524
image-20220120123524244

Expressing with Matrices

βt+1=βtH1L(βt)βt+1=βt(ZTWZ)1ZT(yp)=(ZTWZ)1(ZTW)vv=ZβtW1(yp)\begin{array}{c} {{\bf{\beta }}_{t + 1}} = {{\bf{\beta }}_t} - {{\bf{H}}^{ - 1}}\nabla L({{\bf{\beta }}_t})\\ {{\bf{\beta }}_{t + 1}} = {{\bf{\beta }}_t} - {{\bf{(}}{{\bf{Z}}^{\bf{T}}}{\bf{WZ)}}^{{\bf{ - 1}}}}{{\bf{Z}}^{\bf{T}}}{\bf{(y - p)}}\\ = {{\bf{(}}{{\bf{Z}}^{\bf{T}}}{\bf{WZ)}}^{{\bf{ - 1}}}}({{\bf{Z}}^{\bf{T}}}{\bf{W}}){\bf{v}}\\ {\bf{v}} = {\bf{Z}}{{\bf{\beta }}_t} - {{\bf{W}}^{ - 1}}{\bf{(y - p)}} \end{array}
image-20220120124311939

Iterative Reweighted Least Squares (IRLS)

Solving iteratively with updated values of W,p

βt+1=(ZTWZ)1(ZTW)vv=ZβtW1(yp)p(x)=eβTzi1+eβTzi\begin{array}{c} {{\bf{\beta }}_{t + 1}} = {{\bf{(}}{{\bf{Z}}^{\bf{T}}}{\bf{WZ)}}^{{\bf{ - 1}}}}({{\bf{Z}}^{\bf{T}}}{\bf{W}}){\bf{v}}\\ {\bf{v}} = {\bf{Z}}{{\bf{\beta }}_t} - {{\bf{W}}^{ - 1}}{\bf{(y - p)}}\\ p(x) = \frac{{{e^{{{\bf{\beta }}^T}{\bf{z_i}}}}}}{{1 + {e^{{{\bf{\beta }}^T}{\bf{z_i}}}}}} \end{array}
image-20220120132840280

Further Reading

Q) Comparison with Logistic Regression with Sigmoid Function (see Andrew Ng's lecture)

Note:

1/(1+ex)==ex/(1+ex)1/(1+e^{-x})==e^x/(1+e^x)
  • Maximizes the likelihood vs Minimizes the loss function

image-20220120133134874

Example

Step 1

image-20220120132917461

Step 2

image-20220120132940999

Step 3

image-20220120133007539
image-20220120133023802

Next

  1. Multinomial Logistic Regression:

    MATLAB Example

  2. Fitting Data with Logistic Regression:

    MATLAB Example

Last updated

Was this helpful?