Hierarchical Clustering: CURE and Chameleon

24 Feb 202230 Sep 2025

Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each other cluster, and the objects within each cluster are broadly similar to each other.

Chameleon is a hierarchical clustering algorithm that uses dynamic modeling to decide the similarity among pairs of clusters. It was changed based on the observed weaknesses of two hierarchical clustering algorithms such as ROCK and CURE.

ROCK and related designs emphasize cluster interconnectivity while neglecting data regarding cluster proximity. CURE and related design consider cluster proximity yet neglect cluster interconnectivity. In Chameleon, cluster similarity is assessed depending on how well-connected objects are inside a cluster and on the proximity of clusters. Especially, two clusters are combined if their interconnectivity is high and they are close together.

It does not base on a static, user-supplied model and can automatically adapt to the internal features of the clusters being combined. The merge process supports the discovery of natural and homogeneous clusters and is used for all types of data considering a similarity function can be defined.

Chameleon needs the k-nearest-neighbor graph technique to make a sparse graph, where each vertex of the graph defines a data object, and there exists an edge among two vertices (objects) if one object is between the k-most-similar objects of the other. The edges are weighted to reflect the similarity among objects.

Chameleon uses a graph partitioning algorithm to partition the k-nearest-neighbor graph into a large number of relatively small subclusters. It can use an agglomerative hierarchical clustering algorithm that repeatedly merges subclusters based on their similarity. It can determine the pairs of most similar subclusters, it takes into account both the interconnectivity as well as the closeness of the clusters.

The k-nearest-neighbor graph captures the approach of neighborhood dynamically: the neighborhood radius of an object is decided by the density of the region in which the object resides. In a dense area, the neighborhood is represented narrowly. In a sparse region, it is represented more widely.

This influence results in more natural clusters, in comparison with density-based methods like DBSCAN that instead use a worldwide neighborhood. Furthermore, the density of the region is recorded as the weight of the edges. Especially, the edges of a dense region tend to weigh more than that of a sparse region.

The graph-partitioning algorithm partitions the k-nearest-neighbor graph such that it makes smaller the edge cut. That is, cluster C is subdivided into sub-clusters Ciand Cj to minimize the weight of the edges that can be cut should C be bisected into Ci and Cj . Edge cut is indicated EC (Ci, Cj )and determines the absolute interconnectivity between cluster Ci and Cj.

CURE Clustering	DBSCAN Clustering
CURE Clustering stands for Clustering Using Representatives Clustering.	DBSCAN Clustering stands for Density Based Spatial Clustering of Applications with Noise Clustering.
It is a hierarchial based clustering technique.	It is a density based clustering technique.
Noise handling in CURE clustering is not efficient.	Noise handling in DBSCAN clustering is efficient.
Algorithm: Draw a random sample. Partition the random sample. Partially cluster the partition. Outliers are identified and eliminated. The partial clusters obtained are clubbed into clustered. Label the result on storage.	Algorithm: All the data sample points are labelled as core points, border points or noise points. The noise points are eliminated. All the core points are connected which lie under the vicinity of Eps of each other. The core points which are connected to each other are grouped into a separate cluster. Border points are assigned to each clusters.
It can take care of high dimensional datasets.	It does not work properly for high dimensional datasets.
Varying densities of the data points doesn’t matter in CURE clustering algorithm.	It does not work properly when the data points have varying densities

Hierarchical Clustering: CURE and Chameleon

Like this:

Related

Leave a ReplyCancel reply

Guru Gobind Singh Indraprastha University (BBA) Notes

Chaudhary Charan Singh University BBA Notes (Old and New Syllabus)

Key difference between Memorandum and Articleas of Association, Prospectus

KMBN106 Design Thinking

CCSU(BBA) 401 Consumer Behavior

Price Competition Challenges

Pricing Concepts: Price Bundling, Captive Pricing, Two-Part Pricing, Loss Leadership Pricing and Result Based Pricing

Revenue Management in Specific Service Industries, Principles, Strategies

Service Pricing: Setting up Service Price, Costs, Perceived Value and Competition

Service Delivery Intermediaries, Functions, Types, Selection, Challenges

Share this:

Like this:

Related

You might also like

Leave a ReplyCancel reply