scanpy.external.pp.mnn_correct
- scanpy.external.pp.mnn_correct(*datas, var_index=None, var_subset=None, batch_key='batch', index_unique='-', batch_categories=None, k=20, sigma=1.0, cos_norm_in=True, cos_norm_out=True, svd_dim=None, var_adj=True, compute_angle=False, mnn_order=None, svd_mode='rsvd', do_concatenate=True, save_raw=False, n_jobs=None, **kwargs)
Correct batch effects by matching mutual nearest neighbors [Haghverdi18] [Kang18].
This uses the implementation of mnnpy [Kang18].
Depending on
do_concatenate
, returns matrices orAnnData
objects in the original order containing corrected expression values or a concatenated matrix or AnnData object.Be reminded that it is not advised to use the corrected data matrices for differential expression testing.
More information and bug reports here.
- Parameters:
- datas :
Union
[AnnData
,ndarray
] Expression matrices or AnnData objects. Matrices should be shaped like n_obs × n_vars (n_cell × n_gene) and have consistent number of columns. AnnData objects should have same number of variables.
- var_index :
Optional
[Collection
[str
]] (default:None
) The index (list of str) of vars (genes). Necessary when using only a subset of vars to perform MNN correction, and should be supplied with
var_subset
. Whendatas
are AnnData objects,var_index
is ignored.- var_subset :
Optional
[Collection
[str
]] (default:None
) The subset of vars (list of str) to be used when performing MNN correction. Typically, a list of highly variable genes (HVGs). When set to
None
, uses all vars.- batch_key :
str
(default:'batch'
) The
batch_key
forconcatenate()
. Only valid whendo_concatenate
and supplyingAnnData
objects.- index_unique :
str
(default:'-'
) The
index_unique
forconcatenate()
. Only valid whendo_concatenate
and supplyingAnnData
objects.- batch_categories :
Optional
[Collection
[Any
]] (default:None
) The
batch_categories
forconcatenate()
. Only valid whendo_concatenate
and supplying AnnData objects.- k :
int
(default:20
) Number of mutual nearest neighbors.
- sigma :
float
(default:1.0
) The bandwidth of the Gaussian smoothing kernel used to compute the correction vectors. Default is 1.
- cos_norm_in :
bool
(default:True
) Whether cosine normalization should be performed on the input data prior to calculating distances between cells.
- cos_norm_out :
bool
(default:True
) Whether cosine normalization should be performed prior to computing corrected expression values.
- svd_dim :
Optional
[int
] (default:None
) The number of dimensions to use for summarizing biological substructure within each batch. If None, biological components will not be removed from the correction vectors.
- var_adj :
bool
(default:True
) Whether to adjust variance of the correction vectors. Note this step takes most computing time.
- compute_angle :
bool
(default:False
) Whether to compute the angle between each cell’s correction vector and the biological subspace of the reference batch.
- mnn_order :
Optional
[Sequence
[int
]] (default:None
) The order in which batches are to be corrected. When set to None, datas are corrected sequentially.
- svd_mode :
Literal
['svd'
,'rsvd'
,'irlb'
] (default:'rsvd'
) 'svd'
computes SVD using a non-randomized SVD-via-ID algorithm, while'rsvd'
uses a randomized version.'irlb'
perfores truncated SVD by implicitly restarted Lanczos bidiagonalization (forked from https://github.com/airysen/irlbpy).- do_concatenate :
bool
(default:True
) Whether to concatenate the corrected matrices or AnnData objects. Default is True.
- save_raw :
bool
(default:False
) Whether to save the original expression data in the
raw
attribute.- n_jobs :
Optional
[int
] (default:None
) The number of jobs. When set to
None
, automatically usesscanpy._settings.ScanpyConfig.n_jobs
.- kwargs
optional keyword arguments for irlb.
- datas :
- Return type:
Tuple
[Union
[ndarray
,AnnData
],List
[DataFrame
],Optional
[List
[Tuple
[Optional
[float
],int
]]]]- Returns:
: datas
Corrected matrix/matrices or AnnData object/objects, depending on the input type and
do_concatenate
.- mnn_list
A list containing MNN pairing information as DataFrames in each iteration step.
- angle_list
A list containing angles of each batch.