Let’s consider that we have a few points on a 2D plane with x-y coordinates.
Here, each data point is a cluster of its own. We want to determine a way to compute the distance between each of these points. For this, we try to find the shortest distance between any two data points to form a cluster.
Once we find those with the least distance between them, we start grouping them together and forming clusters of multiple points.
This is represented in a tree-like structure called a dendrogram.
As a result, we have three groups: P1-P2, P3-P4, and P5-P6. Similarly, we have three dendrograms, as shown below:
In the next step, we bring two groups together. Now the two groups P3-P4 and P5-P6 are all under one dendrogram because they’re closer together than the P1-P2 group. This is as shown below:
We finish when we’re left with one cluster and finally bring everything together.
You can see how the cluster on the right went to the top with the gray hierarchical box connecting them.
The next question is: How do we measure the distance between the data points? The next section of the Hierarchical clustering article answers this question.