Logistic Regression-MaximumLikelihood

Logistic Regression - Maximum Likelihood

Concept

https://youtu.be/vN5cNN2-HWE

https://youtu.be/BfKanl1aSG0

Introduction

Click here for the full note

Logistic regression is a classification problem

It is choosing class 'k' of highest Pr(Y=kX=x)Pr(Y=k | X=x)

Q. How to estimate Pr(Y=kX=x)Pr(Y=k | X=x) ?

  • Option 1: Indirectly estimate using Bayes rule. e.g. LDA

  • Option 2: Directly estimate using Logistic Regression

Logistic Regression Model of Posterior Probability Function

From now on, we will assume it is binary classification, K=[0,1], number of K=2

Let function p(x)p(x) is

p(x)=Pr(Y=1X=x)p(x)=Pr(Y=1|X=x)

Definition of odds are

  • Odds: p(x)/(1p(x))p(x)/(1-p(x))

  • log-odds: log[p(x)/(1p(x))]log[p(x)/(1-p(x))]

Let Log-odds is a Linear Function of data X

\begin{array}{c} p(x) = \frac{{{e^{{{\bf{\beta }}^T}{\bf{x}}}}}}{{1 + {e^{{{\bf{\beta }}^T}{\bf{x}}}}}}\\ {\rm{where,}}\,\,\,\,{{\bf{x}}_i} = \left[ {\begin{array}{*{20}{c}} 1\\ {{x_{i1}}}\\ {..}\\ {{x_{ip}}} \end{array}} \right],{\bf{\beta }} = \left[ {\begin{array}{*{20}{c}} {{\beta _0}}\\ {{\beta _1}}\\ {..}\\ {{\beta _p}} \end{array}} \right] \end{array}

Q) How to estimate the parameter βi\beta_i ? Use maximum likelihood

Estimate Posterior Probability Function with Likelihood Function

Given: Training dataset: (z1,y1) (zN,yN)(z_1,y_1)~(z_N,y_N)

The probability of the observed data: product of probability that Y=1 for z_i of k=1 and probability that Y=0 for z_i of k=0

l(β)=i:yi=1Pr(Y=1X=zi)i:yi=0Pr(Y=0X=zi)l(\beta ) = \prod\limits_{i:yi = 1} {\Pr (Y = 1|X = {z_i})} \prod\limits_{i:yi = 0} {\Pr (Y = 0|X = {z_i})}

This is also known as Likelihood function.

Let function p(zi)p(z_i) is

p(zi)=Pr(Y=1X=zi)p(z_i)=Pr(Y=1|X=z_i)

Then, Likelihood function is expressed as

l(β)=i:yi=1p(zi)i:yi=0(1p(zi))l(\beta ) = \prod\limits_{i:yi = 1} {p({z_i})} \prod\limits_{i:yi = 0} {(1-p({z_i}))}

Maximizing Likelihood

Goal: estimate parameter β\beta that maximizes likelihood function.

Maximizing likelihood is also maximizing log-likelihood

L(β)=log(l(β))L(\beta)=log(l(\beta))
\begin{array}{l} L({\bf{\beta }}) = \sum\limits_{i = 1}^N {[{{\bf{y}}_i}{{\bf{\beta }}^T}{{\bf{z}}_i} - \log (1 + {e^{{{\bf{\beta }}^{\bf{T}}}{{\bf{z}}_i}}})]} \\ {\rm{where,}}\,\,\,\,{{\bf{z}}_i} = \left[ {\begin{array}{*{20}{c}} 1\\ {{z_{i1}}}\\ {..}\\ {{z_{ip}}} \end{array}} \right],{\bf{\beta }} = \left[ {\begin{array}{*{20}{c}} {{\beta _0}}\\ {{\beta _1}}\\ {..}\\ {{\beta _p}} \end{array}} \right] \end{array}

Estimating parameter β\beta for Maximizing Log-Likelihood

βt+1=βtH1L(βt){{\bf{\beta }}_{t + 1}} = {{\bf{\beta }}_t} - {{\bf{H}}^{ - 1}}\nabla L({{\bf{\beta }}_t})

Expressing with Matrices

βt+1=βtH1L(βt)βt+1=βt(ZTWZ)1ZT(yp)=(ZTWZ)1(ZTW)vv=ZβtW1(yp)\begin{array}{c} {{\bf{\beta }}_{t + 1}} = {{\bf{\beta }}_t} - {{\bf{H}}^{ - 1}}\nabla L({{\bf{\beta }}_t})\\ {{\bf{\beta }}_{t + 1}} = {{\bf{\beta }}_t} - {{\bf{(}}{{\bf{Z}}^{\bf{T}}}{\bf{WZ)}}^{{\bf{ - 1}}}}{{\bf{Z}}^{\bf{T}}}{\bf{(y - p)}}\\ = {{\bf{(}}{{\bf{Z}}^{\bf{T}}}{\bf{WZ)}}^{{\bf{ - 1}}}}({{\bf{Z}}^{\bf{T}}}{\bf{W}}){\bf{v}}\\ {\bf{v}} = {\bf{Z}}{{\bf{\beta }}_t} - {{\bf{W}}^{ - 1}}{\bf{(y - p)}} \end{array}

Iterative Reweighted Least Squares (IRLS)

Solving iteratively with updated values of W,p

βt+1=(ZTWZ)1(ZTW)vv=ZβtW1(yp)p(x)=eβTzi1+eβTzi\begin{array}{c} {{\bf{\beta }}_{t + 1}} = {{\bf{(}}{{\bf{Z}}^{\bf{T}}}{\bf{WZ)}}^{{\bf{ - 1}}}}({{\bf{Z}}^{\bf{T}}}{\bf{W}}){\bf{v}}\\ {\bf{v}} = {\bf{Z}}{{\bf{\beta }}_t} - {{\bf{W}}^{ - 1}}{\bf{(y - p)}}\\ p(x) = \frac{{{e^{{{\bf{\beta }}^T}{\bf{z_i}}}}}}{{1 + {e^{{{\bf{\beta }}^T}{\bf{z_i}}}}}} \end{array}

Further Reading

Q) Comparison with Logistic Regression with Sigmoid Function (see Andrew Ng's lecture)

Note:

1/(1+ex)==ex/(1+ex)1/(1+e^{-x})==e^x/(1+e^x)
  • Maximizes the likelihood vs Minimizes the loss function

Example

Step 1

Step 2

Step 3

Next

  1. Multinomial Logistic Regression:

    MATLAB Example

  2. Fitting Data with Logistic Regression:

    MATLAB Example

Last updated

Was this helpful?