Density-Based Clustering method is one of the clustering methods based on density (local cluster criterion), such as density-connected points. The basic ideas of density-based clustering involve a number of new definitions. We intuitively present these definitions and then follow up with an example.
The neighborhood within a radius ε of a given object is called the ε-neighborhood of the object.
If the ε-neighborhood of an object contains at least a minimum number, MinPts, of objects, then the object is called a core object.
Density Reachable:
A point p is density-reachable from a point q wrt. Eps, MinPts if there is a chain of points p1, …, pn, p1 = q, pn = p such that pi+1 is directly density-reachable from pi
Density Connected
A point p is density-connected to a point q wrt. Eps, MinPts if there is a point o such that both, p and q are density-reachable from o wrt. Eps and MinPts.
Working Of Density-Based Clustering
Given a set of objects, D’ we say that an object p is directly density-reachable from object q if p is within the ε-neighborhood of q, and q is a core object.
An object p is density-reachable from object q with respect to ε and MinPts in a set of objects, D’ if there is a chain of objects p1,.,.,.pn, where p1 = q and pn = p such that pi+1 is directly density-reachable from pi with respect to e and MinPts, for 1/n, pi € D.
An object p is density-connected to object q with respect to ε and MinPts in a set of objects, D’, if there is an object o, belongs D such that both p and q are density-reachable from o with respect to ε and MinPts.
Major features:
- It is used to discover clusters of arbitrary shape.
- It is also used to handle noise in the data clusters.
- It is a one scan method.
- It needs density parameters as a termination condition.
DBSCAN
It relies on a density-based notion of cluster: A cluster is defined as a maximal set of density-connected points.
It discovers clusters of arbitrary shape in spatial databases with noise.
DBSCAN Algorithm
Arbitrary select a point p.
Retrieve all points density-reachable from p wrt Eps and MinPts
If p is a core point, a cluster is formed.
If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database.
Continue the process until all of the points have been processed.
say, let MinPts = 3.
Of the labeled points, m, p, o, and r are core objects because each is in an ε-neighborhood containing at least three points.
q is directly density-reachable from m. m is directly density-reachable from p and vice versa.
q is (indirectly) density-reachable from p because q is directly density-reachable from m and m is directly density-reachable from p.
However, p is not density-reachable from q because q is not a core object.
Similarly, r and s are density-reachable from o, and o is density-reachable from o, and o is density-reachable from R.
OPTICS: Ordering Points To Identify the Clustering Structure
An algorithm for finding density-based clusters in spatial data. It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and Jörg Sander. Its basic idea is similar to DBSCAN, but it addresses one of DBSCAN’s major weaknesses: the problem of detecting meaningful clusters in data of varying density. To do so, the points of the database are (linearly) ordered such that spatially closest points become neighbors in the ordering. Additionally, a special distance is stored for each point that represents the density that must be accepted for a cluster so that both points belong to the same cluster. This is represented as a dendrogram.
- It produces a special order of the database with respect to its density-based clustering structure.
- This cluster-ordering contains info equivalent to the density-based clusterings corresponding to a broad range of parameter settings.
- It is good for both automatic and interactive cluster analysis, including finding an intrinsic clustering structure.
- It can be represented graphically or using visualization techniques.
Core-distance and reachability-distance: The figure illustrates the concepts of core-distance and reachability-distance.
Suppose that e=6 mm and MinPts=5
The core distance of p is the distance, e0, between p and the fourth closest data object.
The reachability-distance of q1 with respect to p is the core-distance of p (i.e., e0 =3 mm) because this is greater than the Euclidean distance from p to q1.
The reachability distance of q2 with respect to p is the Euclidean distance from p to q2 because this is greater than the core-distance of p.