Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each other cluster, and the objects within each cluster are broadly similar to each other.
Chameleon is a hierarchical clustering algorithm that uses dynamic modeling to decide the similarity among pairs of clusters. It was changed based on the observed weaknesses of two hierarchical clustering algorithms such as ROCK and CURE.
ROCK and related designs emphasize cluster interconnectivity while neglecting data regarding cluster proximity. CURE and related design consider cluster proximity yet neglect cluster interconnectivity. In Chameleon, cluster similarity is assessed depending on how well-connected objects are inside a cluster and on the proximity of clusters. Especially, two clusters are combined if their interconnectivity is high and they are close together.
It does not base on a static, user-supplied model and can automatically adapt to the internal features of the clusters being combined. The merge process supports the discovery of natural and homogeneous clusters and is used for all types of data considering a similarity function can be defined.
Chameleon needs the k-nearest-neighbor graph technique to make a sparse graph, where each vertex of the graph defines a data object, and there exists an edge among two vertices (objects) if one object is between the k-most-similar objects of the other. The edges are weighted to reflect the similarity among objects.
Chameleon uses a graph partitioning algorithm to partition the k-nearest-neighbor graph into a large number of relatively small subclusters. It can use an agglomerative hierarchical clustering algorithm that repeatedly merges subclusters based on their similarity. It can determine the pairs of most similar subclusters, it takes into account both the interconnectivity as well as the closeness of the clusters.
The k-nearest-neighbor graph captures the approach of neighborhood dynamically: the neighborhood radius of an object is decided by the density of the region in which the object resides. In a dense area, the neighborhood is represented narrowly. In a sparse region, it is represented more widely.
This influence results in more natural clusters, in comparison with density-based methods like DBSCAN that instead use a worldwide neighborhood. Furthermore, the density of the region is recorded as the weight of the edges. Especially, the edges of a dense region tend to weigh more than that of a sparse region.
The graph-partitioning algorithm partitions the k-nearest-neighbor graph such that it makes smaller the edge cut. That is, cluster C is subdivided into sub-clusters Ciand Cj to minimize the weight of the edges that can be cut should C be bisected into Ci and Cj . Edge cut is indicated EC (Ci, Cj )and determines the absolute interconnectivity between cluster Ci and Cj.
|CURE Clustering stands for Clustering Using Representatives Clustering.||DBSCAN Clustering stands for Density Based Spatial Clustering of Applications with Noise Clustering.|
|It is a hierarchial based clustering technique.||It is a density based clustering technique.|
|Noise handling in CURE clustering is not efficient.||Noise handling in DBSCAN clustering is efficient.|
|It can take care of high dimensional datasets.||It does not work properly for high dimensional datasets.|
|Varying densities of the data points doesn’t matter in CURE clustering algorithm.||It does not work properly when the data points have varying densities|