scanpy.external.pp.mnn_correct
- scanpy.external.pp.mnn_correct(*datas, var_index=None, var_subset=None, batch_key='batch', index_unique='-', batch_categories=None, k=20, sigma=1.0, cos_norm_in=True, cos_norm_out=True, svd_dim=None, var_adj=True, compute_angle=False, mnn_order=None, svd_mode='rsvd', do_concatenate=True, save_raw=False, n_jobs=None, **kwargs)
Correct batch effects by matching mutual nearest neighbors [Haghverdi18] [Kang18].
This uses the implementation of mnnpy [Kang18].
Depending on
do_concatenate, returns matrices orAnnDataobjects in the original order containing corrected expression values or a concatenated matrix or AnnData object.Be reminded that it is not advised to use the corrected data matrices for differential expression testing.
More information and bug reports here.
- Parameters:
- datas :
Union[AnnData,ndarray] Expression matrices or AnnData objects. Matrices should be shaped like n_obs × n_vars (n_cell × n_gene) and have consistent number of columns. AnnData objects should have same number of variables.
- var_index :
Optional[Collection[str]] (default:None) The index (list of str) of vars (genes). Necessary when using only a subset of vars to perform MNN correction, and should be supplied with
var_subset. Whendatasare AnnData objects,var_indexis ignored.- var_subset :
Optional[Collection[str]] (default:None) The subset of vars (list of str) to be used when performing MNN correction. Typically, a list of highly variable genes (HVGs). When set to
None, uses all vars.- batch_key :
str(default:'batch') The
batch_keyforconcatenate(). Only valid whendo_concatenateand supplyingAnnDataobjects.- index_unique :
str(default:'-') The
index_uniqueforconcatenate(). Only valid whendo_concatenateand supplyingAnnDataobjects.- batch_categories :
Optional[Collection[Any]] (default:None) The
batch_categoriesforconcatenate(). Only valid whendo_concatenateand supplying AnnData objects.- k :
int(default:20) Number of mutual nearest neighbors.
- sigma :
float(default:1.0) The bandwidth of the Gaussian smoothing kernel used to compute the correction vectors. Default is 1.
- cos_norm_in :
bool(default:True) Whether cosine normalization should be performed on the input data prior to calculating distances between cells.
- cos_norm_out :
bool(default:True) Whether cosine normalization should be performed prior to computing corrected expression values.
- svd_dim :
Optional[int] (default:None) The number of dimensions to use for summarizing biological substructure within each batch. If None, biological components will not be removed from the correction vectors.
- var_adj :
bool(default:True) Whether to adjust variance of the correction vectors. Note this step takes most computing time.
- compute_angle :
bool(default:False) Whether to compute the angle between each cell’s correction vector and the biological subspace of the reference batch.
- mnn_order :
Optional[Sequence[int]] (default:None) The order in which batches are to be corrected. When set to None, datas are corrected sequentially.
- svd_mode :
Literal['svd','rsvd','irlb'] (default:'rsvd') 'svd'computes SVD using a non-randomized SVD-via-ID algorithm, while'rsvd'uses a randomized version.'irlb'perfores truncated SVD by implicitly restarted Lanczos bidiagonalization (forked from https://github.com/airysen/irlbpy).- do_concatenate :
bool(default:True) Whether to concatenate the corrected matrices or AnnData objects. Default is True.
- save_raw :
bool(default:False) Whether to save the original expression data in the
rawattribute.- n_jobs :
Optional[int] (default:None) The number of jobs. When set to
None, automatically usesscanpy._settings.ScanpyConfig.n_jobs.- kwargs
optional keyword arguments for irlb.
- datas :
- Return type:
Tuple[Union[ndarray,AnnData],List[DataFrame],Optional[List[Tuple[Optional[float],int]]]]- Returns:
: datas
Corrected matrix/matrices or AnnData object/objects, depending on the input type and
do_concatenate.- mnn_list
A list containing MNN pairing information as DataFrames in each iteration step.
- angle_list
A list containing angles of each batch.