Decision Tree
Decision Tree
ė°ģ“ķ°ė„¼ ė¶ģķģ¬ ģ“ė¤ ģ¬ģ“ģ 씓ģ¬ķė ķØķ“ģ ģģø” ź°ė„ķ ź·ģ¹ė¤ģ ģ”°ķ©ģ¼ė” ėķė.
ė¶ė„(Classification)ģ ķź·(Regression) 문ģ ģ ģ ģ© ź°ė„.
ź³ģ°ė³µģ”ģ± ėė¹ ėģ ģģø” ģ±ė„ģ ģ§ė.
Example (Process of Decsion Tree)
Find the feature and criteria that best differentiate the data.
Q) How to findd the criteria that best differentiate the data?
-> Maximize the information gain. (Minimize the Impurity (Entropy))
Entropy, Impurity
ė¶ģė (Impruty)
ķ“ė¹ ė²ģ£¼ ģģ ģė” ė¤ė„ø ė°ģ“ķ°ź° ģ¼ė§ė ģģ¬ ģėź°?
Entropy
ė¶ģė넼 ģģ¹ģ ģ¼ė” ėķėø ģ²ė

Entropy Example

Impurity Indicator
Entropy
Gini-Index
Rish of Overfitting
Decision trees tend to overfit on data with a large number of features. Getting the right ratio of samples to number of features is important, since a tree with few samples in high dimensional space is very likely to overfit.
We have to limit the maximum depth, the maximum number of nodes, minimum number of data for one node to split.
Pruning
Prevent overfitting using pruning.
Cost Function of Pruning
š¶š¶(š)=šøšš(š)+š¼Ćšæ(š)
š¶š¶(š): ģģ¬ź²°ģ ė묓ģ ė¹ģ© ė³µģ”ė (=ģ¤ė„ź° ģ ģ¼ė©“ģ terminal node ģź° ģ ģ ėØģķ ėŖØėøģ¼ģė” ģģ ź°)
šøš š (š): ź²ģ¦ ė°ģ“ķ°ģ ėķ ģ¤ė¶ė„ģØ
šæ(š): terminal nodeģ ģ (구씰 ė³µģ”ė)
š“ššāš: šøš š (š)ģ šæ(š)넼 ź²°ķ©ķė ź°ģ¤ģ¹ (ģ¬ģ©ģģ ģķ“ ė¶ģ¬, ė³“ķµ 0.01~0.1ģ ź°ģ ģ¬ģ©)
Summary
ģģ¬ź²°ģ ė묓ė ķė²ģ ķėģ©ģ ģ¤ėŖ ė³ģ넼 ģ¬ģ©ķģ¬ ģģø” ź°ė„ķ ź·ģ¹ė¤ģ ģ§ķ©ģ ģģ±ķė ģź³ 리ģ¦.
ķė² ė¶źø° ėė§ė¤ ė³ģ ģģģ ėź°ė” źµ¬ė¶ķ ė¤ ź° ģģģ ģėź° ģ¦ź°(ģķøė”ķ¼ź° ź°ģ)ķė ė°©ķ„ģ¼ė” ķģµģ ģ§ķķØ.
ģ ė „ ė³ģ ģģģ ė ź°ė” 구ė¶ķė **ģ¬ź·ģ ė¶źø°(recursive partioning)**ģ źµ¬ė¶ė ģģģ ķµķ©ķė **ź°ģ§ģ¹źø°(pruning)**ģ ėź°ģ§ ź³¼ģ ģ¼ė” ėėØ.
ģģ¬ź²°ģ ėė¬“ė ź³ģ°ė³µģ”ģ± ėė¹ ėģ ģģø” ģ±ė„ģ ė“ė ģ„ģ ģ ģ§ė.
ķģ§ė§, ź²°ģ ź²½ź³(decision boundary)ź° ė°ģ“ķ° ģ¶ģ ģģ§ģ“ģ“ģ ķ¹ģ ė°ģ“ķ°ģė§ ģ ģėķė ķź³ź° ģģ.
ģ“ė¬ķ 문ģ 넼 ź·¹ė³µķźø° ģķ“ ėė¤ķ¬ė ģ¤ķøģ ź°ģ źø°ė²ģ ģ¬ģ©.
ėė¤ķ¬ė ģ¤ķøė?
ź°ģ ė°ģ“ķ°ģ ėķ“ ģģ¬ź²°ģ ė묓넼 ģ¬ė¬ ź° ė§ė¤ģ“ģ ź·ø 결과넼 ģ”°ķ©ķ“ģ ģģø” ģ±ė„ģ ėģ“ė źø°ė²
ėģ¼ķ ė°ģ“ķ°ė”ė¶ķ° ė³µģģ¶ģ¶ģ ķµķ“ 30ź° ģ“ģģ ė°ģ“ķ°ģ ģ ė§ė¤ģ“ģ ź°ź°ģ ģģ¬ź²°ģ ė묓넼 ģ ģÆģ ė¤ ķģµ ź²°ź³¼ė„¼ ģ·Øķ©ķė ė°©ģģ¼ė” ėģ
References
Theoretical Explanation
https://bkshin.tistory.com/entry/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-4-%EA%B2%B0%EC%A0%95-%ED%8A%B8%EB%A6%ACDecision-Tree
https://ratsgo.github.io/machine%20learning/2017/03/26/tree/
Python Implementation of Decision Tree
https://towardsdatascience.com/construct-a-decision-tree-and-how-to-deal-with-overfitting-f907efc1492d
Scikit-learn documentation
https://scikit-learn.org/stable/modules/tree.html
MATHLAB Example
fitctree & fitrtree (instruction & algorithm)
https://kr.mathworks.com/help/stats/fitctree.html
https://kr.mathworks.com/help/stats/fitrtree.html
View Decision Tree
https://kr.mathworks.com/help/stats/view-decision-tree.html
Prediction Using Classification and Regression Trees
https://kr.mathworks.com/help/stats/prediction-using-classification-and-regression-trees.html
Last updated
Was this helpful?