scanpy.external.pp.mnn_correct#
- scanpy.external.pp.mnn_correct(*datas, var_index=None, var_subset=None, batch_key='batch', index_unique='-', batch_categories=None, k=20, sigma=1.0, cos_norm_in=True, cos_norm_out=True, svd_dim=None, var_adj=True, compute_angle=False, mnn_order=None, svd_mode='rsvd', do_concatenate=True, save_raw=False, n_jobs=None, **kwargs)[source]#
Correct batch effects by matching mutual nearest neighbors [Haghverdi et al., 2018] [Kang, 2018].
This uses the implementation of mnnpy [Kang, 2018].
Depending on
do_concatenate
, returns matrices orAnnData
objects in the original order containing corrected expression values or a concatenated matrix or AnnData object.Be reminded that it is not advised to use the corrected data matrices for differential expression testing.
More information and bug reports here.
- Parameters:
- datas
AnnData
|ndarray
Expression matrices or AnnData objects. Matrices should be shaped like n_obs × n_vars (n_cell × n_gene) and have consistent number of columns. AnnData objects should have same number of variables.
- var_index
Collection
[str
] |None
(default:None
) The index (list of str) of vars (genes). Necessary when using only a subset of vars to perform MNN correction, and should be supplied with
var_subset
. Whendatas
are AnnData objects,var_index
is ignored.- var_subset
Collection
[str
] |None
(default:None
) The subset of vars (list of str) to be used when performing MNN correction. Typically, a list of highly variable genes (HVGs). When set to
None
, uses all vars.- batch_key
str
(default:'batch'
) The
batch_key
forconcatenate()
. Only valid whendo_concatenate
and supplyingAnnData
objects.- index_unique
str
(default:'-'
) The
index_unique
forconcatenate()
. Only valid whendo_concatenate
and supplyingAnnData
objects.- batch_categories
Collection
[Any
] |None
(default:None
) The
batch_categories
forconcatenate()
. Only valid whendo_concatenate
and supplying AnnData objects.- k
int
(default:20
) Number of mutual nearest neighbors.
- sigma
float
(default:1.0
) The bandwidth of the Gaussian smoothing kernel used to compute the correction vectors. Default is 1.
- cos_norm_in
bool
(default:True
) Whether cosine normalization should be performed on the input data prior to calculating distances between cells.
- cos_norm_out
bool
(default:True
) Whether cosine normalization should be performed prior to computing corrected expression values.
- svd_dim
int
|None
(default:None
) The number of dimensions to use for summarizing biological substructure within each batch. If None, biological components will not be removed from the correction vectors.
- var_adj
bool
(default:True
) Whether to adjust variance of the correction vectors. Note this step takes most computing time.
- compute_angle
bool
(default:False
) Whether to compute the angle between each cell’s correction vector and the biological subspace of the reference batch.
- mnn_order
Sequence
[int
] |None
(default:None
) The order in which batches are to be corrected. When set to None, datas are corrected sequentially.
- svd_mode
Literal
['svd'
,'rsvd'
,'irlb'
] (default:'rsvd'
) 'svd'
computes SVD using a non-randomized SVD-via-ID algorithm, while'rsvd'
uses a randomized version.'irlb'
perfores truncated SVD by implicitly restarted Lanczos bidiagonalization (forked from airysen/irlbpy).- do_concatenate
bool
(default:True
) Whether to concatenate the corrected matrices or AnnData objects. Default is True.
- save_raw
bool
(default:False
) Whether to save the original expression data in the
raw
attribute.- n_jobs
int
|None
(default:None
) The number of jobs. When set to
None
, automatically usesscanpy._settings.ScanpyConfig.n_jobs
.- kwargs
optional keyword arguments for irlb.
- datas
- Return type:
tuple
[ndarray
|AnnData
,list
[DataFrame
],list
[tuple
[float
|None
,int
]] |None
]- Returns:
- datas
ndarray
|AnnData
Corrected matrix/matrices or AnnData object/objects, depending on the input type and
do_concatenate
.- mnn_list
list
[DataFrame
] A list containing MNN pairing information as DataFrames in each iteration step.
- angle_list
list
[tuple
[float
|None
,int
]] |None
A list containing angles of each batch.
- datas