scanpy.experimental.pp.normalize_pearson_residuals

scanpy.experimental.pp.normalize_pearson_residuals#

scanpy.experimental.pp.normalize_pearson_residuals(adata, *, theta=100, clip=None, check_values=True, layer=None, inplace=True, copy=False)[source]#

Applies analytic Pearson residual normalization, based on [Lause21].

The residuals are based on a negative binomial offset model with overdispersion theta shared across genes. By default, residuals are clipped to sqrt(n_obs) and overdispersion theta=100 is used.

Expects raw count input.

Parameters:
adata AnnData

The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

theta float (default: 100)

The negative binomial overdispersion parameter theta for Pearson residuals. Higher values correspond to less overdispersion (var = mean + mean^2/theta), and theta=np.Inf corresponds to a Poisson model.

clip float | None (default: None)

Determines if and how residuals are clipped:

  • If None, residuals are clipped to the interval [-sqrt(n_obs), sqrt(n_obs)], where n_obs is the number of cells in the dataset (default behavior).

  • If any scalar c, residuals are clipped to the interval [-c, c]. Set clip=np.Inf for no clipping.

check_values bool (default: True)

If True, checks if counts in selected layer are integers as expected by this function, and return a warning if non-integers are found. Otherwise, proceed without checking. Setting this to False can speed up code for large datasets.

layer str | None (default: None)

Layer to use as input instead of X. If None, X is used.

inplace bool (default: True)

If True, update adata with results. Otherwise, return results. See below for details of what is returned.

copy bool (default: False)

If True, the function runs on a copy of the input object and returns the modified copy. Otherwise, the input object is modified direcly. Not compatible with inplace=False.

Return type:

AnnData | dict[str, ndarray] | None

Returns:

If inplace=True, adata.X or the selected layer in adata.layers is updated with the normalized values. adata.uns is updated with the following fields. If inplace=False, the same fields are returned as dictionary with the normalized values in results_dict['X'].

.uns['pearson_residuals_normalization']['theta']

The used value of the overdisperion parameter theta.

.uns['pearson_residuals_normalization']['clip']

The used value of the clipping parameter.

.uns['pearson_residuals_normalization']['computed_on']

The name of the layer on which the residuals were computed.