scanpy.experimental.pp.normalize_pearson_residuals_pca(adata, *, theta=100, clip=None, n_comps=50, random_state=0, kwargs_pca={}, use_highly_variable=None, check_values=True, inplace=True)

Applies analytic Pearson residual normalization and PCA, based on [Lause21].

The residuals are based on a negative binomial offset model with overdispersion theta shared across genes. By default, residuals are clipped to sqrt(n_obs), overdispersion theta=100 is used, and PCA is run with 50 components.

Operates on the subset of highly variable genes in adata.var['highly_variable'] by default. Expects raw count input.

adata : AnnData

The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

theta : float (default: 100)

The negative binomial overdispersion parameter theta for Pearson residuals. Higher values correspond to less overdispersion (var = mean + mean^2/theta), and theta=np.Inf corresponds to a Poisson model.

clip : Optional[float] (default: None)

Determines if and how residuals are clipped:

  • If None, residuals are clipped to the interval [-sqrt(n_obs), sqrt(n_obs)], where n_obs is the number of cells in the dataset (default behavior).

  • If any scalar c, residuals are clipped to the interval [-c, c]. Set clip=np.Inf for no clipping.

n_comps : Optional[int] (default: 50)

Number of principal components to compute in the PCA step.

random_state : Optional[float] (default: 0)

Random seed for setting the initial states for the optimization in the PCA step.

kwargs_pca : Optional[dict] (default: {})

Dictionary of further keyword arguments passed on to scanpy.pp.pca().

use_highly_variable : Optional[bool] (default: None)

If True, uses gene selection present in adata.var['highly_variable'] to subset the data before normalizing (default). Otherwise, proceed on the full dataset.

check_values : bool (default: True)

If True, checks if counts in selected layer are integers as expected by this function, and return a warning if non-integers are found. Otherwise, proceed without checking. Setting this to False can speed up code for large datasets.

inplace : bool (default: True)

If True, update adata with results. Otherwise, return results. See below for details of what is returned.

Return type



If inplace=False, returns the Pearson residual-based PCA results (as AnnData object). If inplace=True, updates adata with the following fields:


The subset of highly variable genes, normalized by Pearson residuals.


The used value of the overdisperion parameter theta.


The used value of the clipping parameter.


PCA representation of data after gene selection (if applicable) and Pearson residual normalization.


The principal components containing the loadings. When inplace=True and use_highly_variable=True, this will contain empty rows for the genes not selected.


Ratio of explained variance.


Explained variance, equivalent to the eigenvalues of the covariance matrix.