Decision Tree

Decision Tree

  • 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ 이듀 사이에 μ‘΄μž¬ν•˜λŠ” νŒ¨ν„΄μ„ 예츑 κ°€λŠ₯ν•œ κ·œμΉ™λ“€μ˜ μ‘°ν•©μœΌλ‘œ λ‚˜νƒ€λƒ„.

  • λΆ„λ₯˜(Classification)와 νšŒκ·€(Regression) λ¬Έμ œμ— 적용 κ°€λŠ₯.

  • κ³„μ‚°λ³΅μž‘μ„± λŒ€λΉ„ 높은 예츑 μ„±λŠ₯을 μ§€λ‹˜.

Example (Process of Decsion Tree)

  • Find the feature and criteria that best differentiate the data.

  • Q) How to findd the criteria that best differentiate the data?

    • -> Maximize the information gain. (Minimize the Impurity (Entropy))

Entropy, Impurity

  • λΆˆμˆœλ„ (Impruty)

    • ν•΄λ‹Ή λ²”μ£Ό μ•ˆμ— μ„œλ‘œ λ‹€λ₯Έ 데이터가 μ–Όλ§ˆλ‚˜ μ„žμ—¬ μžˆλŠ”κ°€?

  • Entropy

    • λΆˆμˆœλ„λ₯Ό 수치적으둜 λ‚˜νƒ€λ‚Έ 척도

Entropy=βˆ’βˆ‘i(pi)log⁑2(pi)Entropy = - \sum\nolimits_i {({p_i}){{\log }_2}({p_i})}

Entropy Example

Impurity Indicator

  • Entropy Entropy(A)=βˆ‘i=1dRi(βˆ’βˆ‘k=1mpklog⁑2(pk))Entropy(A) = \sum\limits_{i = 1}^d {{R_i}\left( { - \sum\limits_{k = 1}^m {{p_k}{{\log }_2}({p_k})} } \right)}

  • Gini-Index G.I(A)=βˆ‘i=1d(Ri(1βˆ’βˆ‘k=1mpik2))G.I(A) = \sum\limits_{i = 1}^d {\left( {{R_i}\left( {1 - \sum\limits_{k = 1}^m {{p_{ik}}^2} } \right)} \right)}

Rish of Overfitting

  • Decision trees tend to overfit on data with a large number of features. Getting the right ratio of samples to number of features is important, since a tree with few samples in high dimensional space is very likely to overfit.

  • We have to limit the maximum depth, the maximum number of nodes, minimum number of data for one node to split.

Pruning

  • Prevent overfitting using pruning.

Cost Function of Pruning

  • 𝐢𝐢(𝑇)=πΈπ‘Ÿπ‘Ÿ(𝑇)+𝛼×𝐿(𝑇)

    • 𝐢𝐢(𝑇): μ˜μ‚¬κ²°μ •λ‚˜λ¬΄μ˜ λΉ„μš© λ³΅μž‘λ„ (=였λ₯˜κ°€ μ μœΌλ©΄μ„œ terminal node μˆ˜κ°€ 적은 λ‹¨μˆœν•œ λͺ¨λΈμΌμˆ˜λ‘ μž‘μ€ κ°’)

    • 𝐸𝑅𝑅(𝑇): 검증 데이터에 λŒ€ν•œ μ˜€λΆ„λ₯˜μœ¨

    • 𝐿(𝑇): terminal node의 수 (ꡬ쑰 λ³΅μž‘λ„)

    • π΄π‘™π‘β„Žπ‘Ž: 𝐸𝑅𝑅(𝑇)와 𝐿(𝑇)λ₯Ό κ²°ν•©ν•˜λŠ” κ°€μ€‘μΉ˜ (μ‚¬μš©μžμ— μ˜ν•΄ λΆ€μ—¬, 보톡 0.01~0.1의 값을 μ‚¬μš©)

Summary

  • μ˜μ‚¬κ²°μ •λ‚˜λ¬΄λŠ” ν•œλ²ˆμ— ν•˜λ‚˜μ”©μ˜ μ„€λͺ…λ³€μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ 예츑 κ°€λŠ₯ν•œ κ·œμΉ™λ“€μ˜ 집합을 μƒμ„±ν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜.

    • ν•œλ²ˆ λΆ„κΈ° λ•Œλ§ˆλ‹€ λ³€μˆ˜ μ˜μ—­μ„ λ‘κ°œλ‘œ κ΅¬λΆ„ν•œ λ’€ 각 μ˜μ—­μ˜ μˆœλ„κ°€ 증가(μ—”νŠΈλ‘œν”Όκ°€ κ°μ†Œ)ν•˜λŠ” λ°©ν–₯으둜 ν•™μŠ΅μ„ 진행함.

    • μž…λ ₯ λ³€μˆ˜ μ˜μ—­μ„ 두 개둜 κ΅¬λΆ„ν•˜λŠ” **μž¬κ·€μ  λΆ„κΈ°(recursive partioning)**와 κ΅¬λΆ„λœ μ˜μ—­μ„ ν†΅ν•©ν•˜λŠ” **κ°€μ§€μΉ˜κΈ°(pruning)**의 두가지 κ³Όμ •μœΌλ‘œ λ‚˜λ‰¨.

  • μ˜μ‚¬κ²°μ •λ‚˜λ¬΄λŠ” κ³„μ‚°λ³΅μž‘μ„± λŒ€λΉ„ 높은 예츑 μ„±λŠ₯을 λ‚΄λŠ” μž₯점을 μ§€λ‹˜.

  • ν•˜μ§€λ§Œ, 결정경계(decision boundary)κ°€ 데이터 좕에 μˆ˜μ§μ΄μ–΄μ„œ νŠΉμ • λ°μ΄ν„°μ—λ§Œ 잘 μž‘λ™ν•˜λŠ” ν•œκ³„κ°€ 있음.

    • μ΄λŸ¬ν•œ 문제λ₯Ό κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄ λžœλ€ν¬λ ˆμŠ€νŠΈμ™€ 같은 기법을 μ‚¬μš©.

    • λžœλ€ν¬λ ˆμŠ€νŠΈλž€?

      • 같은 데이터에 λŒ€ν•΄ μ˜μ‚¬κ²°μ •λ‚˜λ¬΄λ₯Ό μ—¬λŸ¬ 개 λ§Œλ“€μ–΄μ„œ κ·Έ κ²°κ³Όλ₯Ό μ‘°ν•©ν•΄μ„œ 예츑 μ„±λŠ₯을 λ†’μ΄λŠ” 기법

      • λ™μΌν•œ λ°μ΄ν„°λ‘œλΆ€ν„° λ³΅μ›μΆ”μΆœμ„ 톡해 30개 μ΄μƒμ˜ 데이터셋을 λ§Œλ“€μ–΄μ„œ 각각에 μ˜μ‚¬κ²°μ •λ‚˜λ¬΄λ₯Ό μ μš―μ•ˆ λ’€ ν•™μŠ΅ κ²°κ³Όλ₯Ό μ·¨ν•©ν•˜λŠ” λ°©μ‹μœΌλ‘œ λ™μž‘

References

  • Theoretical Explanation

    • https://bkshin.tistory.com/entry/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-4-%EA%B2%B0%EC%A0%95-%ED%8A%B8%EB%A6%ACDecision-Tree

    • https://ratsgo.github.io/machine%20learning/2017/03/26/tree/

  • Python Implementation of Decision Tree

    • https://towardsdatascience.com/construct-a-decision-tree-and-how-to-deal-with-overfitting-f907efc1492d

  • Scikit-learn documentation

    • https://scikit-learn.org/stable/modules/tree.html

  • MATHLAB Example

    • fitctree & fitrtree (instruction & algorithm)

      • https://kr.mathworks.com/help/stats/fitctree.html

      • https://kr.mathworks.com/help/stats/fitrtree.html

    • View Decision Tree

      • https://kr.mathworks.com/help/stats/view-decision-tree.html

    • Prediction Using Classification and Regression Trees

      • https://kr.mathworks.com/help/stats/prediction-using-classification-and-regression-trees.html

Last updated

Was this helpful?