Last Updated on by
How Does A Tree Based Algorithms Decide Where To Split? Gini Approach
Decision tree can be interpreted as type of supervised learning algorithm that helps in accurate decision making process. The use of Decision Trees can be seen in classification problems where there’s a predefined target variable. Decision Trees In the process of Decision Trees, the sample would be split into two or more homogeneous sets which is performed based on most significant splitter / differentiator in input variables.
Now, let’s see how a tree based algorithm decides where to split?
As we know that Decision Tree algorithms are mainly used in the decision making process as they usually deliver highly accurate results. However, the accuracy from the trees would be affected from the decision of making strategic splits. The splitting process differs from classification and regression trees.
The process of splitting a node into one are more sub nodes is decided by multiple algorithms. Decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes.
The algorithms that decide splitting of a node would be selected based on type of target variables. Know more in-depth about these algorithms with our Data Science Training In Hyderabad program Now, let’s look at the most commonly used algorithms for node split in Decision Tree
In this approach, if we select two items from a population at random then we must ensure that they belong to the same class and probability for this is 1 if population is pure.
This approach works for categorical target variable “Success” or “Failure” & is only capable of performing binary splits. The homogeneity value would automatically increase as the Gini value increases.
Steps To Calculate Gini For A Split
In the first step, we will be finding the value of Gini for sub-nodes. For this we will be using formula sum of square of probability for success and failure (p^2+q^2).
In the next step, we will be calculating Gini for split using weighted Gini score of each node of that split
Example: In the above example, we will be segregating the students based on the target variable. The population would be split into two input variables Gender and Class. Now, by using Gini technique, we will be defining which split is resulting in more homogeneous sub- .
Decision Tree, Algorithm, Gini IndexSplit on Gender:
Calculate, Gini for sub-node Female = (0.2)*(0.2)+(0.8)*(0.8)=0.68
Gini for sub-node Male = (0.65)*(0.65)+(0.35)*(0.35)=0.55
Calculate weighted Gini for Split Gender = (10/30)*0.68+(20/30)*0.55 = 0.59
Similar for Split on Class:
Gini for sub-node Class IX = (0.43)*(0.43)+(0.57)*(0.57)=0.51
Gini for sub-node Class X = (0.56)*(0.56)+(0.44)*(0.44)=0.51
Calculate weighted Gini for Split Class = (14/30)*0.51+(16/30)*0.51 = 0.51
So, form the above values it’s quite clear that Gini score for Split on Gender is higher compared to Split on Class. This means that we will be observing node split on Gender.
Apart from this, there are several other approaches like Chi Square, & others.. Build real-world expertise in handling Decision Trees by practically working on projects in real-time with our Kelly Technologies Data Science training.