scanpy.external.pp.magic

scanpy.external.pp.magic(adata, name_list=None, *, knn=10, decay=15, t='auto', n_pca=100, knn_dist='euclidean', random_state=None, n_jobs=None, verbose=False, copy=None, **kwargs)

Markov Affinity-based Graph Imputation of Cells (MAGIC) API [vanDijk18].

MAGIC is an algorithm for denoising and transcript recover of single cells applied to single-cell sequencing data. MAGIC builds a graph from the data and uses diffusion to smooth out noise and recover the data manifold.

The algorithm implemented here has changed primarily in two ways compared to the algorithm described in [vanDijk18]. Firstly, we use the adaptive kernel described in Moon et al, 2019 [Moon17] for improved stability. Secondly, data diffusion is applied in the PCA space, rather than the data space, for speed and memory improvements.

More information and bug reports here. For help, visit <https://krishnaswamylab.org/get-help>.

Parameters
adata : AnnDataAnnData

An anndata file with .raw attribute representing raw counts.

name_list : {‘all_genes’, ‘pca_only’}, Sequence[str], NoneUnion[Literal[‘all_genes’, ‘pca_only’], Sequence[str], None] (default: None)

Denoised genes to return. The default 'all_genes'/None may require a large amount of memory if the input data is sparse. Another possibility is 'pca_only'.

knn : intint (default: 10)

number of nearest neighbors on which to build kernel

decay : intint (default: 15)

sets decay rate of kernel tails. If None, alpha decaying kernel is not used

t : strstr (default: 'auto')

power to which the diffusion operator is powered. This sets the level of diffusion. If ‘auto’, t is selected according to the Procrustes disparity of the diffused data

n_pca : intint (default: 100)

Number of principal components to use for calculating neighborhoods. For extremely large datasets, using n_pca < 20 allows neighborhoods to be calculated in roughly log(n_samples) time.

knn_dist : strstr (default: 'euclidean')

recommended values: ‘euclidean’, ‘cosine’, ‘precomputed’ Any metric from scipy.spatial.distance can be used distance metric for building kNN graph. If ‘precomputed’, data should be an n_samples x n_samples distance or affinity matrix

random_state : int, RandomState, NoneUnion[int, RandomState, None] (default: None)

Random seed. Defaults to the global numpy random number generator

n_jobs : int, NoneOptional[int] (default: None)

Number of threads to use in training. All cores are used by default.

verbose : boolbool (default: False)

If True or an integer >= 2, print status messages. If None, sc.settings.verbosity is used.

copy : bool, NoneOptional[bool] (default: None)

If true, a copy of anndata is returned. If None, copy is True if genes is not 'all_genes' or 'pca_only'. copy may only be False if genes is 'all_genes' or 'pca_only', as the resultant data will otherwise have different column names from the input data.

kwargs

Additional arguments to magic.MAGIC

Return type

AnnData, NoneOptional[AnnData]

Returns

If copy is True, AnnData object is returned.

If subset_genes is not all_genes, PCA on MAGIC values of cells are stored in adata.obsm['X_magic'] and adata.X is not modified.

The raw counts are stored in .raw attribute of AnnData object.

Examples

>>> import scanpy as sc
>>> import scanpy.external as sce
>>> adata = sc.datasets.paul15()
>>> sc.pp.normalize_per_cell(adata)
>>> sc.pp.sqrt(adata)  # or sc.pp.log1p(adata)
>>> adata_magic = sce.pp.magic(adata, name_list=['Mpo', 'Klf1', 'Ifitm1'], knn=5)
>>> adata_magic.shape
(2730, 3)
>>> sce.pp.magic(adata, name_list='pca_only', knn=5)
>>> adata.obsm['X_magic'].shape
(2730, 100)
>>> sce.pp.magic(adata, name_list='all_genes', knn=5)
>>> adata.X.shape
(2730, 3451)