Scilab Website | Contribute with GitLab | Mailing list archives | ATOMS toolboxes
Scilab Online Help
2025.0.0 - 日本語


pca

Computes principal component analysis for the data matrix X

Syntax

comprinc = pca(X)
[comprinc, score, lambda] = pca(X)
[comprinc, score, lambda, tsquare] = pca(X)
[comprinc, score, lambda, tsquare, explained, mu] = pca(X)

comprinc = pca(X, Name, Value)
[comprinc, score, lambda] = pca(X, Name, Value)
[comprinc, score, lambda, tsquare] = pca(X, Name, Value)
[comprinc, score, lambda, tsquare, explained, mu] = pca(X, Name, Value)

Arguments

X

is a nxp (n individuals, p variables) real matrix.

Name, Value

'Centered': boolean indicator for centering the columns. Default value: %t.

'Economy': boolean indicator, use to allow economy size singular value decomposition. Default value: %t.

'NumComponents': integer value, number of components returned. Default value: size(X, 2).

'Weights': row vector of doubles of length size(X, 1) containing observation weights. Default value: ones(1, size(X, 1)).

'VariableWeights': "variance" value or row vector of doubles of length size(X, 2) containing variable weights. Default value: [].

comprinc

are the principal component coefficients, p-by-p matrix where p is equal to size(X,2).

score

n-by-p matrix or n-by-NumComponents matrix if 'NumComponents' is specified.

are the principal component scores.

lambda

is a p-by-1 or NumComponents-by-1 (if 'NumComponents' is specified) vector.

contains the eignevalues of the covariance X

tsquare

a n column vector. It contains the Hotelling's T^2 statistic for each data point.

explained

a column vector of length "number of components". The percentage of variance explained by each principal component.

mu

a row vector of length p. The estimated mean of each variable of X.

Description

This function performs several computations known as "principal component analysis".

The idea behind this method is to represent in an approximative manner a cluster of n individuals in a smaller dimensional subspace. In order to do that, it projects the cluster onto a subspace. The choice of the k-dimensional projection subspace is made in such a way that the distances in the projection have a minimal deformation: we are looking for a k-dimensional subspace such that the squares of the distances in the projection is as big as possible (in fact in a projection, distances can only stretch). In other words, inertia of the projection onto the k dimensional subspace must be maximal.

To obtain the pca graph, use the show_pca function.

Examples

a=rand(100,10,'n');
[comprinc, scores, lambda] = pca(a);
show_pca(lambda, comprinc)
x = [
395     224     35.1     79.1     6.0     14.9
410     232     31.9     73.4     8.7     16.4
405     233     30.7     76.5     7.0     16.5
405     240     30.4     75.3     8.7     16.0
390     217     31.9     76.5     7.8     15.7
415     243     32.1     77.4     7.1     18.5
390     229     32.1     78.4     4.6     17.0
405     240     31.1     76.5     8.2     15.3
420     234     32.4     76.0     7.2     16.8
390     223     33.8     77.0     6.2     16.8
415     247     30.7     75.5     8.4     16.1
400     234     31.7     77.6     5.7     18.7
400     224     28.2     73.5     11.0    15.5
395     229     29.4     74.5     9.3     16.1
395     219     29.7     72.8     8.7     18.5
395     224     28.5     73.7     8.7     17.3
400     223     28.5     73.1     9.1     17.7
400     224     27.8     73.2     12.2    14.6
400     221     26.5     72.3     13.2    14.5
410     233     25.9     72.3     11.1    16.6
402     234     27.1     72.1     10.4    17.5
400     223     26.8     70.3     13.5    16.2
400     213     25.8     70.4     12.1    17.5
];
[comprinc, scores, lambda, tsquare, explained] = pca(wcenter(x, 1));
scf();
show_pca(lambda, comprinc)
//
// Extract the two last columns.
x = x(:,1:2);
[comprinc, scores, lambda, tsquare, explained] = pca(wcenter(x, 1));
scf();
// See how the points are perfectly on the circle.
show_pca(lambda, comprinc)
x = [1 2 1;2 1 3; 3 2 3]
[comprinc, scores, lambda, tsquare, explained, mu] = pca(x, "Economy", %t);
scores * comprinc' + ones(3, 1) * mu // == x
x = [1 2 1; 2 3 1; 0 2 1; 2 3 9;5 -2 7; -1 2 1];
[comprinc, scores, lambda, tsquare, explained, mu] = pca(x, "VariableWeights", "variance")
// comprinc is not orthonormal
newcomprinc = diag(stdev(x, 1))\comprinc
scf();
show_pca(lambda, newcomprinc)

See also

  • show_pca — Visualization of principal components analysis results

Bibliography

Saporta, Gilbert, Probabilites, Analyse des Donnees et Statistique, Editions Technip, Paris, 1990.

History

バージョン記述
2025.0.0 Improvements of the function:
  • "Economy", "Centered", "NumComponents", "Weights" and "VariableWeights" options are added.

  • Returns now the percentage of the variance explained by each principal component and the estimated mean of each variable of X.

  • Warning, there is an incompatibility with previous versions: the order of the first three output arguments have changed.

    The old [lambda, comprinc, scores] = pca(x) syntax has been replaced by [comprinc, scores, lambda] = pca(x). In previous versions, pca only computed the principal components with standardized variables. To obtain the same results, use "VariableWeights" option with "variance" value: [comprinc, scores, lambda] = pca(x, "VariableWeights", "variance"). In this case, comprinc is not orthonormal. You can compute the orthonormal coefficient thanks to diag(stdev(x))\comprinc. An other possibility is to use wcenter function i.e, [comprinc, scores, lambda] = pca(wcenter(x, 1)). A warning is displayed when pca is called with 1, 2 or 3 output arguments. To avoid the warning to be displayed, call pca with at least four output arguments (e.g. [comprinc, scores, lambda, _] = pca()).

    Note that lambda is a column vector containing the eigenvalues of the covariance matrix of x. To obtain the ratio of the corresponding eigenvalue over the sum of eigenvalues, use the following formula: lambda(:, 1) / sum(lambda(:, 1))

Report an issue
<< covar Multivariate - regress correl PCA princomp >>

Copyright (c) 2022-2024 (Dassault Systèmes)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors
Last updated:
Thu Oct 24 11:17:41 CEST 2024