scanpy.api.pp.pca

scanpy.api.pp.pca(data, n_comps=50, zero_center=True, svd_solver='auto', random_state=0, return_info=False, use_highly_variable=None, dtype='float32', copy=False, chunked=False, chunk_size=None)

Principal component analysis [Pedregosa11].

Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit-learn [Pedregosa11].

Parameters
data : AnnData, ndarray, spmatrixUnion[AnnData, ndarray, spmatrix]

The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

n_comps : intint (default: 50)

Number of principal components to compute.

zero_center : bool, NoneOptional[bool] (default: True)

If True, compute standard PCA from covariance matrix. If False, omit zero-centering variables (uses TruncatedSVD), which allows to handle sparse input efficiently. Passing None decides automatically based on sparseness of the data.

svd_solver : strstr (default: 'auto')

SVD solver to use:

'arpack'

for the ARPACK wrapper in SciPy (svds())

'randomized'

for the randomized algorithm due to Halko (2009).

'auto' (the default)

chooses automatically depending on the size of the problem.

random_state : intint (default: 0)

Change to use different initial states for the optimization.

return_info : boolbool (default: False)

Only relevant when not passing an AnnData: see “Returns”.

use_highly_variable : bool, NoneOptional[bool] (default: None)

Whether to use highly variable genes only, stored in .var['highly_variable']. By default uses them if they have been determined beforehand.

dtype : strstr (default: 'float32')

Numpy data type string to which to convert the result.

copy : boolbool (default: False)

If an AnnData is passed, determines whether a copy is returned. Is ignored otherwise.

chunked : boolbool (default: False)

If True, perform an incremental PCA on segments of chunk_size. The incremental PCA automatically zero centers and ignores settings of random_seed and svd_solver. If False, perform a full PCA.

chunk_size : int, NoneOptional[int] (default: None)

Number of observations to include in each chunk. Required if chunked=True was passed.

Return type

AnnData, ndarray, spmatrixUnion[AnnData, ndarray, spmatrix]

Returns

X_pcascipy.sparse.spmatrix or numpy.ndarray

If data is array-like and return_info=False was passed, this function only returns X_pca

adataAnnData

…otherwise if copy=True it returns or else adds fields to adata:

.obsm['X_pca']

PCA representation of data.

.varm['PCs']

The principal components containing the loadings.

.uns['pca']['variance_ratio'])

Ratio of explained variance.

.uns['pca']['variance']

Explained variance, equivalent to the eigenvalues of the covariance matrix.