scanpy.external.pp.mnn_correct

scanpy.external.pp.mnn_correct#

scanpy.external.pp.mnn_correct(*datas, var_index=None, var_subset=None, batch_key='batch', index_unique='-', batch_categories=None, k=20, sigma=1.0, cos_norm_in=True, cos_norm_out=True, svd_dim=None, var_adj=True, compute_angle=False, mnn_order=None, svd_mode='rsvd', do_concatenate=True, save_raw=False, n_jobs=None, **kwargs)[source]#

Correct batch effects by matching mutual nearest neighbors [Haghverdi et al., 2018] [Kang, 2018].

This uses the implementation of mnnpy [Kang, 2018].

Depending on do_concatenate, returns matrices or AnnData objects in the original order containing corrected expression values or a concatenated matrix or AnnData object.

Be reminded that it is not advised to use the corrected data matrices for differential expression testing.

More information and bug reports here.

Parameters:

datas AnnData | ndarray: Expression matrices or AnnData objects. Matrices should be shaped like n_obs × n_vars (n_cell × n_gene) and have consistent number of columns. AnnData objects should have same number of variables.
var_index Collection[str] | None (default: None): The index (list of str) of vars (genes). Necessary when using only a subset of vars to perform MNN correction, and should be supplied with var_subset. When datas are AnnData objects, var_index is ignored.
var_subset Collection[str] | None (default: None): The subset of vars (list of str) to be used when performing MNN correction. Typically, a list of highly variable genes (HVGs). When set to None, uses all vars.
batch_key str (default: 'batch'): The batch_key for anndata.AnnData.concatenate. Only valid when do_concatenate and supplying AnnData objects.
index_unique str (default: '-'): The index_unique for anndata.AnnData.concatenate. Only valid when do_concatenate and supplying AnnData objects.
batch_categories Collection[Any] | None (default: None): The batch_categories for anndata.AnnData.concatenate. Only valid when do_concatenate and supplying AnnData objects.
k int (default: 20): Number of mutual nearest neighbors.
sigma float (default: 1.0): The bandwidth of the Gaussian smoothing kernel used to compute the correction vectors. Default is 1.
cos_norm_in bool (default: True): Whether cosine normalization should be performed on the input data prior to calculating distances between cells.
cos_norm_out bool (default: True): Whether cosine normalization should be performed prior to computing corrected expression values.
svd_dim int | None (default: None): The number of dimensions to use for summarizing biological substructure within each batch. If None, biological components will not be removed from the correction vectors.
var_adj bool (default: True): Whether to adjust variance of the correction vectors. Note this step takes most computing time.
compute_angle bool (default: False): Whether to compute the angle between each cell’s correction vector and the biological subspace of the reference batch.
mnn_order Sequence[int] | None (default: None): The order in which batches are to be corrected. When set to None, datas are corrected sequentially.
svd_mode Literal['svd', 'rsvd', 'irlb'] (default: 'rsvd'): 'svd' computes SVD using a non-randomized SVD-via-ID algorithm, while 'rsvd' uses a randomized version. 'irlb' perfores truncated SVD by implicitly restarted Lanczos bidiagonalization (forked from airysen/irlbpy).
do_concatenate bool (default: True): Whether to concatenate the corrected matrices or AnnData objects. Default is True.
save_raw bool (default: False): Whether to save the original expression data in the raw attribute.
n_jobs int | None (default: None): The number of jobs. When set to None, automatically uses scanpy.settings.n_jobs.
kwargs: optional keyword arguments for irlb.

Return type:

tuple[ndarray | AnnData, list[DataFrame], list[tuple[float | None, int]] | None]

Returns:

datasndarray | AnnData: Corrected matrix/matrices or AnnData object/objects, depending on the input type and do_concatenate.
mnn_listlist[DataFrame]: A list containing MNN pairing information as DataFrames in each iteration step.
angle_listlist[tuple[float | None, int]] | None: A list containing angles of each batch.

scanpy.external.pp.mnn_correct

Contents

scanpy.external.pp.mnn_correct#