kmeans

K-means clustering

Syntax

idx = kmeans(X, k)
[idx, c] = kmeans(X, k)

Arguments

X: is a nxp (n observations, p variables) real matrix.
k: a positive integer. It corresponds to number of clusters.
idx: an integer column vector. It corresponds to clusters indices.
c: a k x p real matrix containing to cluster centroid locations.

Description

kmeans is an unsupervised learning method for clustering data points. The algorithm iteratively aims to divide the points of X into k clusters, by minimizing the sum of the distances between the data points and the cluster centroid.

kmeans uses the squared Euclidean distance metric.

idx = kmeans(X, k) returns the column vector containing cluster indices of each point.

[idx, c] = kmeans(X, k) returns the k-by-p matrix containing the k cluster centroid locations.

Examples

rand("seed", 0)
n = 200;
x1 = rand(n, 2, "normal") + 3 * ones(n, 2);
x2 = rand(n, 2, "normal") - 3 * ones(n, 2);
x3 = rand(n, 2, "normal") + [3 -3].*.ones(n, 1);
x4 = rand(n, 2, "normal") + [-3 3].*.ones(n, 1);
x5 = rand(n, 2, "normal") + [1 -1].*.ones(n, 1);
x6 = rand(n, 2, "normal") + [-1 1].*.ones(n, 1);
x = [x1; x2; x3; x4; x5; x6];
scatter(x(:,1), x(:,2), "fill")

[index, c] = kmeans(x, 6);

scf(1);
scatter(x(:,1), x(:,2), [], index, "fill")
plot(c(:,1), c(:,2), "*r") // centroid of each cluster

History

Version	Description
2025.0.0	Introduction in Scilab.

Report an issue
<< Hypothesis Testing	Statistics	polyfit >>

Copyright (c) 2022-2025 (Dassault Systèmes S.E.)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors

Last updated:
Thu May 22 12:50:59 CEST 2025