pdist

Pairwise distances between observations

Syntax

d = pdist(X)
d = pdist(X, metric)
d = pdist(X, metric, param)

Arguments

X: n x p real matrix containing n observations stored line-wise (rows) and p features stored column-wise.
metric: optional string selecting the distance metric to use. If omitted, the Euclidean distance is applied.
param: optional real argument that is only used with the "seuclidean", "mahalanobis" and "minkowski" metrics (see Description).
d: a row vector containing the pairwise distances between all rows of X. The vector has length n * (n - 1) / 2.

Description

pdist computes the distance between every pair of observations (rows) of X and returns the result as a condensed row vector. This format is convenient for linkage algorithms and any workflow that manipulates inter-point distances without building the full symmetric matrix.

Possible values for metric names are (aliases are listed in parentheses):

"euclidean" ("euclid", "eu", "e"): Euclidean (L2) distance.
"squaredeuclidean" ("sqeuclidean", "sqe", "sqeuclid"): squared L2 distance (no square root).
"seuclidean" ("se", "s"): standardized Euclidean distance (each column is scaled).
"mahalanobis" ("mahal", "mah"): Mahalanobis distance that accounts for covariance.
"cityblock" ("city", "city block", "cblock", "cb", "c"): city-block or Manhattan distance.
"minkowski" ("mi", "m"): Minkowski distance of order param.
"chebychev" ("chebyshev", "cheby", "cheb", "ch"): Chebyshev (L∞) distance.
"cosine" ("cos"): cosine distance, defined as 1 minus the cosine similarity.
"correlation" ("co"): correlation distance, defined as 1 minus the sample correlation between the rows.
"hamming" ("hamm", "ha", "h"): Hamming distance, i.e. the fraction of positions with different values.
"jaccard" ("jacc", "ja", "j"): Jaccard distance for binary/bool-like data.
"canberra": Canberra distance, sensitive to differences near zero.
"braycurtis": Bray-Curtis dissimilarity.

When metric is "seuclidean", the param argument must be a 1 x p row vector that contains the scaling factors (typically column-wise standard deviations). If it is omitted, stdev(X, "r") is computed automatically and zero entries are replaced by 1.

For "mahalanobis", param must be a p x p covariance matrix. When it is omitted, pdist computes cov(X) and uses its inverse.

For "minkowski", param must be a positive scalar that specifies the order of the Minkowski norm (defaults to 2).

Any value supplied in param for the other metrics is ignored (a warning is emitted).

Examples

Euclidean distances between four 2-D points

X = [0 0;
     1 0;
     0 2;
     1 2];

d = pdist(X)
// 1.000000 2.000000 2.236068 2.236068 2.000000 1.000000

Standardized Euclidean distance with custom scales

X = [1 0 2;
     2 4 6;
     3 7 1];

scale = [0.5 2 1.5]; // one weight per column
d = pdist(X, "seuclidean", scale)
// 3.8873013 5.3567196 4.1666667

History

Version	Description
2026.1.0	Function added.

Report an issue
<< meanshift	Statistics	pdist2 >>

Copyright (c) 2022-2026 (Dassault Systèmes S.E.)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors

Last updated:
Tue May 19 13:56:06 CEST 2026