pdist
Pairwise distances between observations
Syntax
d = pdist(X) d = pdist(X, metric) d = pdist(X, metric, param)
Arguments
- X
n x p real matrix containing n observations stored line-wise (rows) and p features stored column-wise.
- metric
optional string selecting the distance metric to use. If omitted, the Euclidean distance is applied.
- param
optional real argument that is only used with the
"seuclidean","mahalanobis"and"minkowski"metrics (see Description).- d
a row vector containing the pairwise distances between all rows of
X. The vector has lengthn * (n - 1) / 2.
Description
pdist computes the distance between every pair of observations (rows) of X and returns the result as a
condensed row vector. This format is convenient for linkage algorithms and any workflow that manipulates
inter-point distances without building the full symmetric matrix.
Possible values for metric names are (aliases are listed in parentheses):
"euclidean"("euclid","eu","e"): Euclidean (L2) distance."squaredeuclidean"("sqeuclidean","sqe","sqeuclid"): squared L2 distance (no square root)."seuclidean"("se","s"): standardized Euclidean distance (each column is scaled)."mahalanobis"("mahal","mah"): Mahalanobis distance that accounts for covariance."cityblock"("city","city block","cblock","cb","c"): city-block or Manhattan distance."minkowski"("mi","m"): Minkowski distance of orderparam."chebychev"("chebyshev","cheby","cheb","ch"): Chebyshev (L∞) distance."cosine"("cos"): cosine distance, defined as 1 minus the cosine similarity."correlation"("co"): correlation distance, defined as 1 minus the sample correlation between the rows."hamming"("hamm","ha","h"): Hamming distance, i.e. the fraction of positions with different values."jaccard"("jacc","ja","j"): Jaccard distance for binary/bool-like data."canberra": Canberra distance, sensitive to differences near zero."braycurtis": Bray-Curtis dissimilarity.
When metric is "seuclidean", the param argument must be a 1 x p row vector that contains the scaling factors
(typically column-wise standard deviations). If it is omitted, stdev(X, "r") is computed automatically and zero entries are replaced by 1.
For "mahalanobis", param must be a p x p covariance matrix. When it is omitted, pdist computes cov(X)
and uses its inverse.
For "minkowski", param must be a positive scalar that specifies the order of the Minkowski norm (defaults to 2).
Any value supplied in param for the other metrics is ignored (a warning is emitted).
Examples
Euclidean distances between four 2-D points
X = [0 0; 1 0; 0 2; 1 2]; d = pdist(X) // 1.000000 2.000000 2.236068 2.236068 2.000000 1.000000
Standardized Euclidean distance with custom scales
X = [1 0 2; 2 4 6; 3 7 1]; scale = [0.5 2 1.5]; // one weight per column d = pdist(X, "seuclidean", scale) // 3.8873013 5.3567196 4.1666667
See also
History
| Версия | Описание |
| 2026.1.0 | Function added. |
| Report an issue | ||
| << meanshift | Statistics | pdist2 >> |