Logistic Regression - Maximum Likelihood
Concept
https://youtu.be/vN5cNN2-HWE
https://youtu.be/BfKanl1aSG0
Introduction
Click here for the full note
Logistic regression is a classification problem
It is choosing class 'k' of highest Pr(Y=k∣X=x)
Q. How to estimate Pr(Y=k∣X=x) ?
Option 1: Indirectly estimate using Bayes rule. e.g. LDA
Option 2: Directly estimate using Logistic Regression
Logistic Regression Model of Posterior Probability Function
From now on, we will assume it is binary classification, K=[0,1], number of K=2
Let function p(x) is
p(x)=Pr(Y=1∣X=x) Definition of odds are
Odds: p(x)/(1−p(x))
log-odds: log[p(x)/(1−p(x))]
Let Log-odds is a Linear Function of data X
\begin{array}{c} p(x) = \frac{{{e^{{{\bf{\beta }}^T}{\bf{x}}}}}}{{1 + {e^{{{\bf{\beta }}^T}{\bf{x}}}}}}\\ {\rm{where,}}\,\,\,\,{{\bf{x}}_i} = \left[ {\begin{array}{*{20}{c}} 1\\ {{x_{i1}}}\\ {..}\\ {{x_{ip}}} \end{array}} \right],{\bf{\beta }} = \left[ {\begin{array}{*{20}{c}} {{\beta _0}}\\ {{\beta _1}}\\ {..}\\ {{\beta _p}} \end{array}} \right] \end{array}
Q) How to estimate the parameter βi ? Use maximum likelihood
Estimate Posterior Probability Function with Likelihood Function
Given: Training dataset: (z1,y1) (zN,yN)
The probability of the observed data: product of probability that Y=1 for z_i of k=1 and probability that Y=0 for z_i of k=0
l(β)=i:yi=1∏Pr(Y=1∣X=zi)i:yi=0∏Pr(Y=0∣X=zi) This is also known as Likelihood function.
Let function p(zi) is
p(zi)=Pr(Y=1∣X=zi) Then, Likelihood function is expressed as
l(β)=i:yi=1∏p(zi)i:yi=0∏(1−p(zi)) Maximizing Likelihood
Goal: estimate parameter β that maximizes likelihood function.
Maximizing likelihood is also maximizing log-likelihood
L(β)=log(l(β)) \begin{array}{l} L({\bf{\beta }}) = \sum\limits_{i = 1}^N {[{{\bf{y}}_i}{{\bf{\beta }}^T}{{\bf{z}}_i} - \log (1 + {e^{{{\bf{\beta }}^{\bf{T}}}{{\bf{z}}_i}}})]} \\ {\rm{where,}}\,\,\,\,{{\bf{z}}_i} = \left[ {\begin{array}{*{20}{c}} 1\\ {{z_{i1}}}\\ {..}\\ {{z_{ip}}} \end{array}} \right],{\bf{\beta }} = \left[ {\begin{array}{*{20}{c}} {{\beta _0}}\\ {{\beta _1}}\\ {..}\\ {{\beta _p}} \end{array}} \right] \end{array}
Estimating parameter
β for Maximizing Log-Likelihood
βt+1=βt−H−1∇L(βt) Expressing with Matrices
βt+1=βt−H−1∇L(βt)βt+1=βt−(ZTWZ)−1ZT(y−p)=(ZTWZ)−1(ZTW)vv=Zβt−W−1(y−p) Iterative Reweighted Least Squares (IRLS)
Solving iteratively with updated values of W,p
βt+1=(ZTWZ)−1(ZTW)vv=Zβt−W−1(y−p)p(x)=1+eβTzieβTzi Further Reading
Q) Comparison with Logistic Regression with Sigmoid Function (see Andrew Ng's lecture)
Note:
1/(1+e−x)==ex/(1+ex) Maximizes the likelihood vs Minimizes the loss function
Example
Step 1
Step 2
Step 3
Next