kmeans

K-means clustering

Syntax

idx = kmeans(X, k)
[idx, c] = kmeans(X, k)

Arguments

X: is a nxp (n observations, p variables) real matrix.
k: a positive integer. It corresponds to number of clusters.
idx: an integer column vector. It corresponds to clusters indices.
c: a k x p real matrix containing to cluster centroid locations.

Description

kmeans is an unsupervised learning method for clustering data points. The algorithm iteratively aims to divide the points of X into k clusters, by minimizing the sum of the distances between the data points and the cluster centroid.

kmeans uses the squared Euclidean distance metric.

idx = kmeans(X, k) returns the column vector containing cluster indices of each point.

[idx, c] = kmeans(X, k) returns the k-by-p matrix containing the k cluster centroid locations.

Examples

rand("seed", 0)
n = 200;
x1 = rand(n, 2, "normal") + 3 * ones(n, 2);
x2 = rand(n, 2, "normal") - 3 * ones(n, 2);
x3 = rand(n, 2, "normal") + [3 -3].*.ones(n, 1);
x4 = rand(n, 2, "normal") + [-3 3].*.ones(n, 1);
x5 = rand(n, 2, "normal") + [1 -1].*.ones(n, 1);
x6 = rand(n, 2, "normal") + [-1 1].*.ones(n, 1);
x = [x1; x2; x3; x4; x5; x6];

nbcluster = 6;
[index, c] = kmeans(x, nbcluster); 

colors = round(linspace(1, 32, nbcluster));
for i = 1:nbcluster
    mask = index == i;
    cc = colors(i);
    p = plot(x(mask, 1), x(mask, 2), ".", "marksize", 10);
    p.mark_foreground = cc;
    p.mark_background = cc;
end
plot(c(:,1), c(:,2), "*r", "markersize", 10) // centroid of each cluster
gca().box = "off";

l = legend(["cluster " + string(1:6), "centroids"])
l.marks_count =1;
l.line_width = 0.02;

History

Version	Description
2025.0.0	Function added.

Report an issue
<< estimate_bandwidth	Statistics	meanshift >>

Copyright (c) 2022-2026 (Dassault Systèmes S.E.)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors

Last updated:
Thu Oct 16 09:02:33 CEST 2025