scanpy.pp.pca

scanpy.pp.pca(data, n_comps=50, zero_center=True, svd_solver='arpack', random_state=0, return_info=False, use_highly_variable=None, dtype='float32', copy=False, chunked=False, chunk_size=None)

Principal component analysis [Pedregosa11].

Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit-learn [Pedregosa11].

Parameters
data : AnnData, ndarray, spmatrixUnion[AnnData, ndarray, spmatrix]

The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

n_comps : intint (default: 50)

Number of principal components to compute.

zero_center : bool, NoneOptional[bool] (default: True)

If True, compute standard PCA from covariance matrix. If False, omit zero-centering variables (uses TruncatedSVD), which allows to handle sparse input efficiently. Passing None decides automatically based on sparseness of the data.

svd_solver : strstr (default: 'arpack')

SVD solver to use:

'arpack'

for the ARPACK wrapper in SciPy (svds())

'randomized'

for the randomized algorithm due to Halko (2009).

'auto' (the default)

chooses automatically depending on the size of the problem.

Changed in version 1.4.5: Default value changed from 'auto' to 'arpack'.

random_state : int, RandomState, NoneUnion[int, RandomState, None] (default: 0)

Change to use different initial states for the optimization.

return_info : boolbool (default: False)

Only relevant when not passing an AnnData: see “Returns”.

use_highly_variable : bool, NoneOptional[bool] (default: None)

Whether to use highly variable genes only, stored in .var['highly_variable']. By default uses them if they have been determined beforehand.

dtype : strstr (default: 'float32')

Numpy data type string to which to convert the result.

copy : boolbool (default: False)

If an AnnData is passed, determines whether a copy is returned. Is ignored otherwise.

chunked : boolbool (default: False)

If True, perform an incremental PCA on segments of chunk_size. The incremental PCA automatically zero centers and ignores settings of random_seed and svd_solver. If False, perform a full PCA.

chunk_size : int, NoneOptional[int] (default: None)

Number of observations to include in each chunk. Required if chunked=True was passed.

Return type

AnnData, ndarray, spmatrixUnion[AnnData, ndarray, spmatrix]

Returns

X_pcaspmatrix, ndarray

If data is array-like and return_info=False was passed, this function only returns X_pca

adataAnnData

…otherwise if copy=True it returns or else adds fields to adata:

.obsm['X_pca']

PCA representation of data.

.varm['PCs']

The principal components containing the loadings.

.uns['pca']['variance_ratio']

Ratio of explained variance.

.uns['pca']['variance']

Explained variance, equivalent to the eigenvalues of the covariance matrix.