Last Updated on by

**How Does A
Tree Based Algorithms Decide Where To Split? Gini Approach **

Decision tree can be interpreted as type of supervised learning algorithm that helps in accurate decision making process. The use of Decision Trees can be seen in classification problems where there’s a predefined target variable. Decision Trees In the process of Decision Trees, the sample would be split into two or more homogeneous sets which is performed based on most significant splitter / differentiator in input variables.

*Now, let’s see how a tree based algorithm decides where to
split?*

As we know that Decision Tree algorithms are mainly used in the decision making process as they usually deliver highly accurate results. However, the accuracy from the trees would be affected from the decision of making strategic splits. The splitting process differs from classification and regression trees.

The process of splitting a node into one are more sub nodes is decided by multiple algorithms. Decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes.

The algorithms that decide splitting of a node would
be selected based on type of target variables. Know more in-depth about these
algorithms with our **Data
Science Training In Hyderabad **program Now, let’s look at the most
commonly used algorithms for node split in Decision Tree

**Gini**

In this approach, if we select two items from a population at random then we must ensure that they belong to the same class and probability for this is 1 if population is pure.

This approach works for categorical target variable “Success” or “Failure” & is only capable of performing binary splits. The homogeneity value would automatically increase as the Gini value increases.

**Steps To
Calculate Gini For A Split**

In the first step, we will be finding the value of Gini
for sub-nodes. For this we will be using formula sum of square of probability
for success and failure **(p^2+q^2).**

In the next step, we will be calculating Gini for split using weighted Gini score of each node of that split

**Example: **In
the above example, we will be segregating the students based on the target
variable. The population would be split into two input variables Gender and
Class. Now, by using Gini technique, we will be defining which split is resulting
in more homogeneous sub- .

**Decision
Tree, Algorithm, Gini IndexSplit on Gender:**

Calculate, Gini for sub-node Female = (0.2)*(0.2)+(0.8)*(0.8)=0.68

Gini for sub-node Male = (0.65)*(0.65)+(0.35)*(0.35)=0.55

**Calculate
weighted Gini for Split Gender = (10/30)*0.68+(20/30)*0.55 = 0.59**

**Similar for
Split on Class:**

Gini for sub-node Class IX = (0.43)*(0.43)+(0.57)*(0.57)=0.51

Gini for sub-node Class X = (0.56)*(0.56)+(0.44)*(0.44)=0.51

**Calculate
weighted Gini for Split Class = (14/30)*0.51+(16/30)*0.51 = 0.51**

So, form the above values it’s quite clear that Gini score for Split on Gender is higher compared to Split on Class. This means that we will be observing node split on Gender.

Apart from this, there are several other approaches like Chi Square, & others.. Build real-world expertise in handling Decision Trees by practically working on projects in real-time with our Kelly Technologies Data Science training.