concept Gini impurity index in category machine learning

appears as: Gini impurity index, Gini impurity indices, The Gini impurity index, Gini impurity index
Grokking Machine Learning MEAP V09

This is an excerpt from Manning's book Grokking Machine Learning MEAP V09.

In a nutshell, the Gini impurity index measures the diversity in a set. Let’s say, for example, that we have a bag full of balls of several colors. A bag where all the balls have the same color, has a very low Gini impurity index (in fact, it is zero). A bag where all the balls have different colors has a very high Gini impurity index.

Figure 7.14. A bag full of balls of the same color has a low Gini impurity index. A bag full of balls of different colors has a very high Gini impurity index.

Like everything in math, we need to attach a number to the Gini impurity index, and for this, we turn to our good old friend, probability. In a bag full of balls of different colors, we play the following game. We pick a ball out of this set, randomly, and we look at its color. Then we put the ball back. We proceed to randomly pick another ball from the bag (it could happen that it is the same ball, or a different one, we don’t know). We look at its color. We record the two colors we obtained, and we check if they are equal, or different. Here is the main observation of Gini index:

Notice that Set 4 is just the mirror image of Set 1, and Set 5 is the mirror image of Set 2. Therefore, their Gini impurity index should be the same as the original one. In other words, the Gini impurity index of Set 4 is 0.375, and that of Set 5 is 0.

Going back to App 1 and App 2 and summarizing, this is what we have calculated for the Gini impurity indices of our sets.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest