dbscan

Density-based clustering (DBSCAN)

Syntax

labels = dbscan(X, eps, min_samples)
[labels, core_idx] = dbscan(X, eps, min_samples)

Arguments

X: is a N x D (N samples, D features) real matrix.
eps: a positive scalar (default=0.5). It is the neighborhood radius. Points within distance eps are considered neighbors.
min_samples: an integer (default=5). It is the minimum number of points required within the eps neighborhood for a point to be considered a core point.
labels: integers column vector (N x 1). Cluster assignment for each point. The value -1 indicates a noise point.
core_idx: vector containing the indices of the core points.

Description

The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm groups points that are close to each other into clusters, based on local density. Points that are too far from others are considered noise.

It is based on two main concepts:

eps: this is the maximum distance between two points so that they are considered in the same neighborhood.
min_samples corresponds to the minimum number of points needed to form a dense cluster.

So, each cluster is formed around core points, which are points that have at least min_samples neighbors within distance eps. Points close to core points but not dense enough themselves are called border points. All remaining points are labeled as noise.

Unlike kmeans, the DBSCAN algorithm does not require the number of clusters to be specified in advance, and it can detect clusters of arbitrary shapes.

Examples

Two compact clusters with noise

rand("seed", 0)
n = 50;
X = [rand(n,2); rand(n,2)+3];
X = [X; 6*rand(10,2)];
labels = dbscan(X, 0.5, 5);
scf();
gca().isoview = "on";
scatter(X(:,1), X(:,2), [], labels, "fill");
xtitle("Two compact clusters with noise");

Three clusters of different densities

rand("seed", 0)
X1 = 0.3*rand(100,2);
X2 = rand(50,2) + 3;
X3 = 1.8*rand(100,2) - 2;

X = [X1; X2; X3];
labels = dbscan(X, 0.4, 5);

scf();
scatter(X(:,1), X(:,2), [], labels, "fill");
xtitle("Three clusters of different densities");
gca().isoview = "on";

half-moon shaped data

rand("seed", 0)
n = 100;
t = linspace(0, %pi, n)';
X1 = [cos(t), sin(t)] + 0.05*rand(n,2);
X2 = [1-cos(t), -sin(t)-0.5] + 0.05*rand(n,2);

X = [X1; X2];
labels = dbscan(X, 0.2, 5);

scf();
gca().isoview = "on";
scatter(X(:,1), X(:,2), [], labels, "fill");

Circular cluster with noise

rand("seed", 0)
theta = 2*%pi*rand(200,1);
r = 1 + 0.1*rand(200,1);
X1 = [r.*cos(theta), r.*sin(theta)]; // circle

X2 = 3*(rand(30,2)-0.5); // noise

X = [X1; X2];
labels = dbscan(X, 0.2, 5);

scf();
gca().isoview = "on";
scatter(X(:,1), X(:,2), [], labels, "fill");
xtitle("Circular cluster with noise");

Nested spirals

rand("seed", 0)
t = linspace(0, 4*%pi, 200)';
r = linspace(0.1, 1, 200)';

X1 = [r.*cos(t), r.*sin(t)] + 0.02*rand(200,2);
X2 = [r.*cos(t+%pi), r.*sin(t+%pi)] + 0.02*rand(200,2);

X = [X1; X2];
labels = dbscan(X, 0.15, 5);

scf();
gca().isoview = "on";
scatter(X(:,1), X(:,2), [], labels, "fill");
xtitle("Nested spirals");

History

Versão	Descrição
2026.0.0	Function added.

Report an issue
<< Hypothesis Testing	Estatística	estimate_bandwidth >>

Copyright (c) 2022-2025 (Dassault Systèmes S.E.)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors

Last updated:
Thu Oct 16 09:15:31 CEST 2025