scanpy.experimental.pp.normalize_pearson_residuals_pca
- scanpy.experimental.pp.normalize_pearson_residuals_pca(adata, *, theta=100, clip=None, n_comps=50, random_state=0, kwargs_pca={}, use_highly_variable=None, check_values=True, inplace=True)
Applies analytic Pearson residual normalization and PCA, based on [Lause21].
The residuals are based on a negative binomial offset model with overdispersion
thetashared across genes. By default, residuals are clipped tosqrt(n_obs), overdispersiontheta=100is used, and PCA is run with 50 components.Operates on the subset of highly variable genes in
adata.var['highly_variable']by default. Expects raw count input.- Parameters:
- adata :
AnnData The annotated data matrix of shape
n_obs×n_vars. Rows correspond to cells and columns to genes.- theta :
float(default:100) The negative binomial overdispersion parameter
thetafor Pearson residuals. Higher values correspond to less overdispersion (var = mean + mean^2/theta), andtheta=np.Infcorresponds to a Poisson model.- clip :
Optional[float] (default:None) Determines if and how residuals are clipped:
If
None, residuals are clipped to the interval[-sqrt(n_obs), sqrt(n_obs)], wheren_obsis the number of cells in the dataset (default behavior).If any scalar
c, residuals are clipped to the interval[-c, c]. Setclip=np.Inffor no clipping.
- n_comps :
Optional[int] (default:50) Number of principal components to compute in the PCA step.
- random_state :
Optional[float] (default:0) Random seed for setting the initial states for the optimization in the PCA step.
- kwargs_pca :
Optional[dict] (default:{}) Dictionary of further keyword arguments passed on to
scanpy.pp.pca().- use_highly_variable :
Optional[bool] (default:None) If
True, uses gene selection present inadata.var['highly_variable']to subset the data before normalizing (default). Otherwise, proceed on the full dataset.- check_values :
bool(default:True) If
True, checks if counts in selected layer are integers as expected by this function, and return a warning if non-integers are found. Otherwise, proceed without checking. Setting this toFalsecan speed up code for large datasets.- inplace :
bool(default:True) If
True, updateadatawith results. Otherwise, return results. See below for details of what is returned.
- adata :
- Return type:
- Returns:
: If
inplace=False, returns the Pearson residual-based PCA results (asAnnDataobject). Ifinplace=True, updatesadatawith the following fields:.uns['pearson_residuals_normalization']['pearson_residuals_df']The subset of highly variable genes, normalized by Pearson residuals.
.uns['pearson_residuals_normalization']['theta']The used value of the overdisperion parameter theta.
.uns['pearson_residuals_normalization']['clip']The used value of the clipping parameter.
.obsm['X_pca']PCA representation of data after gene selection (if applicable) and Pearson residual normalization.
.varm['PCs']The principal components containing the loadings. When
inplace=Trueanduse_highly_variable=True, this will contain empty rows for the genes not selected..uns['pca']['variance_ratio']Ratio of explained variance.
.uns['pca']['variance']Explained variance, equivalent to the eigenvalues of the covariance matrix.