Agglomerate clustering begins with each element as a separate cluster and merges them into larger clusters.
There are three key questions need to be answered:
- How do we represent a cluster that has more than one point?
- How do we determine the nearness of clusters?
- When do we stop combining clusters?
Let’s assume that we have six data points in a Euclidean space. We’re dealing with X-Y dimensions in such a case.
How do we represent a cluster of more than one point?
Here, we will make use of centroids, which is the average of its points.
Let’s first take the points 1.2 and 2.1, and we’ll group them together because they’re close. For these points, we compute a point in the middle and mark it as (1.5,1.5). It’s the centroid of those two points.
Next, we measure the other group of points by taking 4.1 and 5.0. We set up a centroid of those two points as (4.5,0.5).
Once we have the centroid of the two groups, we see that the next closest point to a centroid (1.5, 1.5) is (0,0) and group them, computing a new centroid based on those three points. The new centroid will be (1,1).
We do the same with the last point (5,3), and it computes into the first group. You can see that the dendrogram on the right is growing. Now each of these points is connected. We group them, and finally, we get a centroid of that group, too, at (4.7,1.3).
Finally, we combine the two groups by their centroids and end up with one large group that has its centroid. Usually, we don’t compute the last centroid; we just put them all together.
Now that we’ve resolved the matter of representing clusters and determining their nearness, when do we stop combining clusters?
There are many approaches you can take:
Approach 1: Pick several clusters(k) upfront
When we don’t want to look at 200 clusters, we pick the K value. We decide the number of clusters (say, the first six or seven) required in the beginning, and we finish when we reach the value K. This is done to limit the incoming information.
That can be very important, especially if you’re feeding it into another algorithm that requires three or four values.
Possible challenges: This approach only makes sense when you know the data well. When you’re clustering with K clusters, you probably already know that domain. But if you’re exploring brand new data, you may not know how many clusters you need.
Approach 2: Stop when the next merge would create a cluster with low cohesion.
We keep clustering until the next merge of clusters creates a bad cluster/low cohesion setup. That means the point is so close to being in both the clusters that it doesn’t make sense to bring them together.
Approach 3.1: Cluster’s Diameter
Diameter is the maximum distance between any pair of points in the cluster. We finish when the diameter of a new cluster exceeds the threshold. We don’t want the two circles or clusters to overlap as that diameter increases.
Approach 3.2: Cluster’s radius
Radius is the maximum distance of a point from the centroid. We finish when the radius of a new cluster exceeds the threshold.
Let us now discuss another type of hierarchical clustering i.e. divisive clustering.