๐Ÿ–๏ธ
gitbook_docs
  • Introduction
  • Machine Learning
    • Recommended Courses
      • For Undergrad Research
      • Math for Machine Learning
    • ML Notes
      • Covariance Correlation
      • Feature Selection
      • Linear Regression
      • Entropy, Cross-Entropy, KL Divergence
      • Bayesian Classifier
        • Terminology Review
        • Bayesian Classifier for Normally Distributed classes
      • Linear Discriminant Analysis
      • Logistic Regression
        • Logistic Regression Math
      • Logistic Regression-MaximumLikelihood
      • SVM
        • SVM concept
        • SVM math
      • Cross Validation
      • Parameter, Density Estimation
        • MAP, MLE
        • Gaussian Mixture Model
      • E-M
      • Density Estimation(non-parametric)
      • Unsupervised Learning
      • Clustering
      • kNN
      • WaveletTransform
      • Decision Tree
    • Probability and Statistics for Machine Learning
      • Introduction
      • Basics of Data Analysis
      • Probability for Discrete Random Variable
      • Poisson Distribution
      • Chi-Square Distribution
      • P-value and Statistical Hypothesis
      • Power and Sample Size
      • Hypothesis Test Old
      • Hypothesis Test
      • Multi Armed Bandit
      • Bayesian Inference
      • Bayesian Updating with Continuous Priors
      • Discrete Distribution
      • Comparison of Bayesian and frequentist inference
      • Confidence Intervals for Normal Data
      • Frequenist Methods
      • Null Hypothesis Significance Testing
      • Confidence Intervals: Three Views
      • Confidence Intervals for the Mean of Non-normal Data
      • Probabilistic Prediction
  • Industrial AI
    • PHM Dataset
    • BearingFault_Journal
      • Support Vector Machine based
      • Autoregressive(AR) model based
      • Envelope Extraction based
      • Wavelet Decomposition based
      • Prediction of RUL with Deep Convolution Nueral Network
      • Prediction of RUL with Information Entropy
      • Feature Model and Feature Selection
    • TempCore Journal
      • Machine learning of mechanical properties of steels
      • Online prediction of mechanical properties of hot rolled steel plate using machine learning
      • Prediction and Analysis of Tensile Properties of Austenitic Stainless Steel Using Artificial Neural
      • Tempcore, new process for the production of high quality reinforcing
      • TEMPCORE, the most convenient process to produce low cost high strength rebars from 8 to 75 mm
      • Experimental investigation and simulation of structure and tensile properties of Tempcore treated re
    • Notes
  • LiDAR
    • Processing of Point Cloud
    • Intro. 3D Object Detection
    • PointNet
    • PointNet++
    • Frustrum-PointNet
    • VoxelNet
    • Point RCNN
    • PointPillars
    • LaserNet
  • Simulator
    • Simulator List
    • CARLA
    • Airsim
      • Setup
      • Tutorial
        • T#1
        • T#2
        • T#3: Opencv CPP
        • T#4: Opencv Py
        • Untitled
        • T#5: End2End Driving
  • Resources
    • Useful Resources
    • Github
    • Jekyll
  • Reinforcement Learning
    • RL Overview
      • RL Bootcamp
      • MIT Deep RL
    • Textbook
    • Basics
    • Continuous Space RL
  • Unsupervised Learning
    • Introduction
  • Unclassified
    • Ethics
    • Conference Guideline
  • FPGA
    • Untitled
  • Numerical Method
    • NM API reference
Powered by GitBook
On this page
  • ์ •๋ณด๋Ÿ‰
  • Reference

Was this helpful?

  1. Machine Learning
  2. ML Notes

Entropy, Cross-Entropy, KL Divergence

PreviousLinear RegressionNextBayesian Classifier

Last updated 2 years ago

Was this helpful?

์ •๋ณด๋Ÿ‰

์ •๋ณด์ด๋ก ์—์„œ๋Š” ์ž์ฃผ ์ผ์–ด๋‚˜์ง€ ์•Š๋Š” ์‚ฌ๊ฑด์˜ ์ •๋ณด๋Ÿ‰์€ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์‚ฌ๊ฑด๋ณด๋‹ค ์ •๋ณด๋Ÿ‰์ด ๋งŽ๋‹ค๊ณ  ๊ฐ„์ฃผํ•จ

์ •๋ณด๋Ÿ‰์„ ํ™•๋ฅ ์— ๋Œ€ํ•œ ํ•จ์ˆ˜ (0~1) ๋กœ ์ •์˜ํ•œ๋‹ค๋ฉด

์‚ฌ๊ฑดA์ด ์ผ์–ด๋‚  ํ™•๋ฅ  P(A)๋กœ ์‚ฌ๊ฑด A์˜ ์ •๋ณด๋Ÿ‰ h(A)์„ ์ •์˜ํ•˜๋ฉด

h(A):=โˆ’logP(A)

์ •๋ณด๋Ÿ‰ ๋กœ๊ทธ ํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„: -log(x) X์ถ•: (0.0, 0.2, 0.4, 0.6, 0.8, 1.0) / y์ถ•: (0, 2, 4, 6, 8 ,10 ,12)

Example

  • P(A)=0.99 --> ์ •๋ณด๋Ÿ‰์€ h(A)=โˆ’logP(A)=โˆ’log0.99=0.01

  • P(B)=0.01 --> ์ •๋ณด๋Ÿ‰์€ h(B)=โˆ’logP(B)=โˆ’log0.01= 4.61

์—”ํŠธ๋กœํ”ผ(Entropy)

์ด์‚ฐํ™•๋ฅ ๋ณ€์ˆ˜(discrete random variable)์˜ ํ‰๊ท  ์ •๋ณด๋Ÿ‰, ๋ถˆํ™•์‹ค์„ฑ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋ƒ„

์ด์‚ฐํ™•๋ฅ ๋ณ€์ˆ˜ X์˜ ํ‰๊ท  ์ •๋ณด๋Ÿ‰ H[X]๋Š”

Example

  • P(X=0)=0.5, P(X=1)=0.5 H[X]=โˆ’(0.5log0.5+0.5log0.5)=0.69 (<-- max. entropty)

  • P(X=0)=0.8, P(X=1)=0.2, H[X]=โˆ’(0.8log0.8+0.2log0.2)=0.50

  • P(X=0)=1, P(X=1)=0, H[X]=โˆ’(1log1+0log0)=0.

KL Divergence

Cross-entropy

์ฃผ์–ด์ง„ ํ™•๋ฅ ๋ณ€์ˆ˜ X ์— ๋Œ€ํ•ด์„œ ํ™•๋ฅ ๋ถ„ํฌ p ๋ฅผ ์ฐพ๋Š” ๋ฌธ์ œ์—์„œ ํ™•๋ฅ ๋ถ„ํฌ p ์˜ ์ •ํ™•ํ•œ ํ˜•ํƒœ๋ฅผ ๋ชจ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— p ๋ฅผ ์˜ˆ์ธกํ•œ ๊ทผ์‚ฌ ๋ถ„ํฌ q ๋ฅผ ์ƒ๊ฐํ•  ๊ฒƒ์ด๋‹ค.

์ •ํ™•ํ•œ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด q ์˜ parameter๋“ค์„ updateํ•˜๋ฉด์„œ q ๋ฅผ p ์— ๊ทผ์‚ฌํ•  ๊ฒƒ์ด๋‹ค

์ฆ‰. ๋‘ ๋ถ„ํฌ์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” KL(p|q)๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๋Š” q๋ฅผ ์ฐพ๋Š” ๋ฌธ์ œ๊ฐ€ ๋จ

KL(p|q) ์˜ ๋‘ ๋ฒˆ์งธํ•ญ ($โˆ’โˆ‘p_i log p_i$) ๋Š” ๊ทผ์‚ฌ๋ถ„ํฌ q์— ๋ฌด๊ด€ํ•œ ํ•ญ์ด๋ฏ€๋กœ

KL Divergence๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๊ฒƒ์€ ๊ฒฐ๊ตญ ์ฒซ ๋ฒˆ์งธ ํ•ญ cross-entropy๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” q๋ฅผ ์ฐพ์•„์•ผ ํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ p =( p i ) ๋Š” ์‹ค์ œ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ์˜๋ฏธํ•˜๊ณ  q=( q i ) ๋Š” p ๋ฅผ ๊ทผ์‚ฌํ•œ ๋ถ„ํฌ๋‹ค.

Reference

H[X]=โˆ‘pilogpi,i=1ย toย NH[X]= โˆ‘p_i log p_i, i=1 ~to~ NH[X]=โˆ‘piโ€‹logpiโ€‹,i=1ย toย N

๋‘ ํ™•๋ฅ ๋ถ„ํฌ์˜ ๋‹ค๋ฅธ ์ •๋„๋ฅผ ์ธก์ •. Relative entropy ๋ผ๊ณ ๋„ ํ•˜๋ฉฐ ์ •์‹ ๋ช…์นญ์€ ์ด๋‹ค.

KL(pโˆฃq):=โˆ’โˆ‘pilogqiโˆ’(โˆ’โˆ‘pilogpi)=โˆ’โˆ‘pilog(qipi)K L ( p | q ) := โˆ’ โˆ‘ p_ i log q_ i โˆ’ ( โˆ’ โˆ‘ p_ i log p_ i )=โˆ’ โˆ‘ p_ i log ( q_ i p _i ) KL(pโˆฃq):=โˆ’โˆ‘piโ€‹logqiโ€‹โˆ’(โˆ’โˆ‘piโ€‹logpiโ€‹)=โˆ’โˆ‘piโ€‹log(qiโ€‹piโ€‹)

KL(pโˆฃq):=โˆ’โˆ‘pilogqiK L ( p | q ) := โˆ’ โˆ‘ p_ i log q_ i KL(pโˆฃq):=โˆ’โˆ‘piโ€‹logqiโ€‹

Kullbackโ€“Leibler divergence
Entropy, Cross-entropy, KL Divergence | ์•Œ๊ธฐ ์‰ฌ์šด ์‚ฐ์—…์ˆ˜ํ•™ | ์‚ฐ์—…์ˆ˜ํ•™ํ˜์‹ ์„ผํ„ฐ
Logo