concept node in category machine learning

This is an excerpt from Manning's book Grokking Machine Learning MEAP V09.
As you can see, our decision process looks like a tree, except upside down. On the very top you can see the tree stump (which we call the root), from which two branches emanate. We call this a binary tree. Each branch leads to a new tree stump (which we call a node), from which it again splits in two. At every node there is a yes/no question. The two branches coming out of the node correspond to the two possible answers (yes or no) to this question. As you can see, the tree doesn’t go on forever, and there are nodes where the tree simply stops branching out. We call these leaves. This arrangement of nodes, leaves, and edges is what we call a decision tree. Trees are very natural objects in computer science, since computers can be broken down to a huge number of on and off switches, which is why everything in a computer is binary.
Figure 7.2. A regular decision tree with a root, nodes, and leaves. Note that each node contains a yes/no question, From each possible answer, one branch emanates, which leads to another node, or a leaf.
![]()

This is an excerpt from Manning's book Machine Learning with R, the tidyverse, and mlr.
Tree construction is a greedy process and can be limited by setting stopping criteria (such as the minimum number of cases required in a node before it can be split).
Both the SOM and LLE algorithms reduce a large dataset into a smaller, more manageable number of variables, but they work in very different ways. The SOM algorithm creates a two-dimensional grid of nodes, like grid references on a map. Each case in the data is placed into a node and then shuffled around the nodes so that cases that are more similar to each other in the original data are put close together on the map.
Imagine that we have a dataset with three variables, and we want to distribute the cases of this dataset across the nodes of our map. Eventually, we hope the algorithm will place the cases in the nodes such that similar cases are in the same node or a nearby node, and dissimilar cases are placed in nodes far away from each other.
After the creation of the map, the next thing the algorithm does is randomly assign each node a set of weights: one weight for each variable in the dataset. So for our example, each node has three weights, because we have three variables. These weights are just random numbers, and you can think of them as guesses for the value of each of the variables. If this is hard to visualize, take a look at figure 15.4. We have a dataset containing three variables, and we are looking at three nodes from a map. Each node has three numbers written under it: one corresponding to each variable in the dataset. For example, the weights for node 1 are 3 (for var 1), 9 (for var 2), and 1 (for var 3). Remember, at this point these are just random guesses for the value of each variable.
Next, the algorithm chooses a case at random from the dataset and calculates which node’s weights are the closest match to this case’s values for each of the variables. For example, if there were a case in the dataset whose values for var 1, var 2, and var 3 were 3, 9, and 1, respectively, this case would perfectly match the weights of node 1. To find which node’s weights are most similar to the case in question, the distance is calculated between each case and the weights of each node in the map. This distance is usually the squared Euclidean distance. Remember that Euclidean distance is just the straight-line distance between two points, so the squared Euclidean distance just omits the square root step to make the computation faster.
In figure 15.4, you can see the distances calculated between the first case and each of the node’s weights. This case is most similar to the weights of node 1, because it has the smallest squared Euclidean distance to them (93.09).
Once the distances between a particular case and all of the nodes have been calculated, the node with the smallest distance (most similar to the case) is selected as that case’s best matching unit (BMU). This is illustrated in figure 15.5. Just like when we put beads into bowls, the algorithm takes that case and places it inside its BMU.
Figure 15.4. How distances between each case to each node are calculated. The arrows pointing from each variable to each node represent the weight for that variable on that particular node (for example, the weights of node 1 are 3, 9, and 1). Distance is calculated by finding the difference between a node’s weights and a case’s value for each variable, squaring these differences, and summing them.
![]()