🖍️
gitbook_docs
  • Introduction
  • Machine Learning
    • Recommended Courses
      • For Undergrad Research
      • Math for Machine Learning
    • ML Notes
      • Covariance Correlation
      • Feature Selection
      • Linear Regression
      • Entropy, Cross-Entropy, KL Divergence
      • Bayesian Classifier
        • Terminology Review
        • Bayesian Classifier for Normally Distributed classes
      • Linear Discriminant Analysis
      • Logistic Regression
        • Logistic Regression Math
      • Logistic Regression-MaximumLikelihood
      • SVM
        • SVM concept
        • SVM math
      • Cross Validation
      • Parameter, Density Estimation
        • MAP, MLE
        • Gaussian Mixture Model
      • E-M
      • Density Estimation(non-parametric)
      • Unsupervised Learning
      • Clustering
      • kNN
      • WaveletTransform
      • Decision Tree
    • Probability and Statistics for Machine Learning
      • Introduction
      • Basics of Data Analysis
      • Probability for Discrete Random Variable
      • Poisson Distribution
      • Chi-Square Distribution
      • P-value and Statistical Hypothesis
      • Power and Sample Size
      • Hypothesis Test Old
      • Hypothesis Test
      • Multi Armed Bandit
      • Bayesian Inference
      • Bayesian Updating with Continuous Priors
      • Discrete Distribution
      • Comparison of Bayesian and frequentist inference
      • Confidence Intervals for Normal Data
      • Frequenist Methods
      • Null Hypothesis Significance Testing
      • Confidence Intervals: Three Views
      • Confidence Intervals for the Mean of Non-normal Data
      • Probabilistic Prediction
  • Industrial AI
    • PHM Dataset
    • BearingFault_Journal
      • Support Vector Machine based
      • Autoregressive(AR) model based
      • Envelope Extraction based
      • Wavelet Decomposition based
      • Prediction of RUL with Deep Convolution Nueral Network
      • Prediction of RUL with Information Entropy
      • Feature Model and Feature Selection
    • TempCore Journal
      • Machine learning of mechanical properties of steels
      • Online prediction of mechanical properties of hot rolled steel plate using machine learning
      • Prediction and Analysis of Tensile Properties of Austenitic Stainless Steel Using Artificial Neural
      • Tempcore, new process for the production of high quality reinforcing
      • TEMPCORE, the most convenient process to produce low cost high strength rebars from 8 to 75 mm
      • Experimental investigation and simulation of structure and tensile properties of Tempcore treated re
    • Notes
  • LiDAR
    • Processing of Point Cloud
    • Intro. 3D Object Detection
    • PointNet
    • PointNet++
    • Frustrum-PointNet
    • VoxelNet
    • Point RCNN
    • PointPillars
    • LaserNet
  • Simulator
    • Simulator List
    • CARLA
    • Airsim
      • Setup
      • Tutorial
        • T#1
        • T#2
        • T#3: Opencv CPP
        • T#4: Opencv Py
        • Untitled
        • T#5: End2End Driving
  • Resources
    • Useful Resources
    • Github
    • Jekyll
  • Reinforcement Learning
    • RL Overview
      • RL Bootcamp
      • MIT Deep RL
    • Textbook
    • Basics
    • Continuous Space RL
  • Unsupervised Learning
    • Introduction
  • Unclassified
    • Ethics
    • Conference Guideline
  • FPGA
    • Untitled
  • Numerical Method
    • NM API reference
Powered by GitBook
On this page
  • Logistic Regression - Maximum Likelihood
  • Concept
  • Introduction
  • Example
  • Next

Was this helpful?

  1. Machine Learning
  2. ML Notes

Logistic Regression-MaximumLikelihood

Logistic Regression - Maximum Likelihood

Concept

https://youtu.be/vN5cNN2-HWE

https://youtu.be/BfKanl1aSG0

Introduction

Click here for the full note

Logistic regression is a classification problem

It is choosing class 'k' of highest Pr(Y=k∣X=x)Pr(Y=k | X=x)Pr(Y=k∣X=x)

Q. How to estimate Pr(Y=k∣X=x)Pr(Y=k | X=x)Pr(Y=k∣X=x) ?

  • Option 1: Indirectly estimate using Bayes rule. e.g. LDA

  • Option 2: Directly estimate using Logistic Regression

Logistic Regression Model of Posterior Probability Function

From now on, we will assume it is binary classification, K=[0,1], number of K=2

Let function p(x)p(x)p(x) is

p(x)=Pr(Y=1∣X=x)p(x)=Pr(Y=1|X=x)p(x)=Pr(Y=1∣X=x)

Definition of odds are

  • Odds: p(x)/(1−p(x))p(x)/(1-p(x))p(x)/(1−p(x))

  • log-odds: log[p(x)/(1−p(x))]log[p(x)/(1-p(x))]log[p(x)/(1−p(x))]

Let Log-odds is a Linear Function of data X

Q) How to estimate the parameter βi\beta_iβi​ ? Use maximum likelihood

Estimate Posterior Probability Function with Likelihood Function

Given: Training dataset: (z1,y1) (zN,yN)(z_1,y_1)~(z_N,y_N)(z1​,y1​) (zN​,yN​)

The probability of the observed data: product of probability that Y=1 for z_i of k=1 and probability that Y=0 for z_i of k=0

l(β)=∏i:yi=1Pr⁡(Y=1∣X=zi)∏i:yi=0Pr⁡(Y=0∣X=zi)l(\beta ) = \prod\limits_{i:yi = 1} {\Pr (Y = 1|X = {z_i})} \prod\limits_{i:yi = 0} {\Pr (Y = 0|X = {z_i})}l(β)=i:yi=1∏​Pr(Y=1∣X=zi​)i:yi=0∏​Pr(Y=0∣X=zi​)

This is also known as Likelihood function.

Let function p(zi)p(z_i)p(zi​) is

p(zi)=Pr(Y=1∣X=zi)p(z_i)=Pr(Y=1|X=z_i)p(zi​)=Pr(Y=1∣X=zi​)

Then, Likelihood function is expressed as

l(β)=∏i:yi=1p(zi)∏i:yi=0(1−p(zi))l(\beta ) = \prod\limits_{i:yi = 1} {p({z_i})} \prod\limits_{i:yi = 0} {(1-p({z_i}))}l(β)=i:yi=1∏​p(zi​)i:yi=0∏​(1−p(zi​))

Maximizing Likelihood

Goal: estimate parameter β\betaβ that maximizes likelihood function.

Maximizing likelihood is also maximizing log-likelihood

L(β)=log(l(β))L(\beta)=log(l(\beta))L(β)=log(l(β))

Estimating parameter β\betaβ for Maximizing Log-Likelihood

βt+1=βt−H−1∇L(βt){{\bf{\beta }}_{t + 1}} = {{\bf{\beta }}_t} - {{\bf{H}}^{ - 1}}\nabla L({{\bf{\beta }}_t})βt+1​=βt​−H−1∇L(βt​)

Expressing with Matrices

βt+1=βt−H−1∇L(βt)βt+1=βt−(ZTWZ)−1ZT(y−p)=(ZTWZ)−1(ZTW)vv=Zβt−W−1(y−p)\begin{array}{c} {{\bf{\beta }}_{t + 1}} = {{\bf{\beta }}_t} - {{\bf{H}}^{ - 1}}\nabla L({{\bf{\beta }}_t})\\ {{\bf{\beta }}_{t + 1}} = {{\bf{\beta }}_t} - {{\bf{(}}{{\bf{Z}}^{\bf{T}}}{\bf{WZ)}}^{{\bf{ - 1}}}}{{\bf{Z}}^{\bf{T}}}{\bf{(y - p)}}\\ = {{\bf{(}}{{\bf{Z}}^{\bf{T}}}{\bf{WZ)}}^{{\bf{ - 1}}}}({{\bf{Z}}^{\bf{T}}}{\bf{W}}){\bf{v}}\\ {\bf{v}} = {\bf{Z}}{{\bf{\beta }}_t} - {{\bf{W}}^{ - 1}}{\bf{(y - p)}} \end{array}βt+1​=βt​−H−1∇L(βt​)βt+1​=βt​−(ZTWZ)−1ZT(y−p)=(ZTWZ)−1(ZTW)vv=Zβt​−W−1(y−p)​

Iterative Reweighted Least Squares (IRLS)

Solving iteratively with updated values of W,p

βt+1=(ZTWZ)−1(ZTW)vv=Zβt−W−1(y−p)p(x)=eβTzi1+eβTzi\begin{array}{c} {{\bf{\beta }}_{t + 1}} = {{\bf{(}}{{\bf{Z}}^{\bf{T}}}{\bf{WZ)}}^{{\bf{ - 1}}}}({{\bf{Z}}^{\bf{T}}}{\bf{W}}){\bf{v}}\\ {\bf{v}} = {\bf{Z}}{{\bf{\beta }}_t} - {{\bf{W}}^{ - 1}}{\bf{(y - p)}}\\ p(x) = \frac{{{e^{{{\bf{\beta }}^T}{\bf{z_i}}}}}}{{1 + {e^{{{\bf{\beta }}^T}{\bf{z_i}}}}}} \end{array}βt+1​=(ZTWZ)−1(ZTW)vv=Zβt​−W−1(y−p)p(x)=1+eβTzi​eβTzi​​​

Further Reading

Q) Comparison with Logistic Regression with Sigmoid Function (see Andrew Ng's lecture)

Note:

1/(1+e−x)==ex/(1+ex)1/(1+e^{-x})==e^x/(1+e^x)1/(1+e−x)==ex/(1+ex)
  • Maximizes the likelihood vs Minimizes the loss function

Example

Step 1

Step 2

Step 3

Next

  1. Multinomial Logistic Regression:

  2. Fitting Data with Logistic Regression:

PreviousLogistic Regression MathNextSVM

Last updated 3 years ago

Was this helpful?

MATLAB Example
MATLAB Example
image-20220120120416955
image-20220120121204377
image-20220120122406662
image-20220120123452524
image-20220120123524244
image-20220120124311939
image-20220120132840280
image-20220120133134874
image-20220120132917461
image-20220120132940999
image-20220120133007539
image-20220120133023802