scanpy.external.pp.magic
- scanpy.external.pp.magic(adata, name_list=None, *, knn=5, decay=1, knn_max=None, t=3, n_pca=100, solver='exact', knn_dist='euclidean', random_state=None, n_jobs=None, verbose=False, copy=None, **kwargs)
Markov Affinity-based Graph Imputation of Cells (MAGIC) API [vanDijk18].
MAGIC is an algorithm for denoising and transcript recover of single cells applied to single-cell sequencing data. MAGIC builds a graph from the data and uses diffusion to smooth out noise and recover the data manifold.
The algorithm implemented here has changed primarily in two ways compared to the algorithm described in [vanDijk18]. Firstly, we use the adaptive kernel described in Moon et al, 2019 [Moon17] for improved stability. Secondly, data diffusion is applied in the PCA space, rather than the data space, for speed and memory improvements.
More information and bug reports here. For help, visit <https://krishnaswamylab.org/get-help>.
- Parameters
- adata :
AnnData
AnnData
An anndata file with
.raw
attribute representing raw counts.- name_list : {‘all_genes’, ‘pca_only’} |
Sequence
[str
] |None
Union
[Literal
[‘all_genes’, ‘pca_only’],Sequence
[str
],None
] (default:None
) Denoised genes to return. The default
'all_genes'
/None
may require a large amount of memory if the input data is sparse. Another possibility is'pca_only'
.- knn :
int
int
(default:5
) number of nearest neighbors on which to build kernel.
- decay :
float
|None
Optional
[float
] (default:1
) sets decay rate of kernel tails. If None, alpha decaying kernel is not used.
- knn_max :
int
|None
Optional
[int
] (default:None
) maximum number of nearest neighbors with nonzero connection. If
None
, will be set to 3 *knn
.- t : {‘auto’} |
int
Union
[Literal
[‘auto’],int
] (default:3
) power to which the diffusion operator is powered. This sets the level of diffusion. If ‘auto’, t is selected according to the Procrustes disparity of the diffused data.
- n_pca :
int
|None
Optional
[int
] (default:100
) Number of principal components to use for calculating neighborhoods. For extremely large datasets, using n_pca < 20 allows neighborhoods to be calculated in roughly log(n_samples) time. If
None
, no PCA is performed.- solver : {‘exact’, ‘approximate’}
Literal
[‘exact’, ‘approximate’] (default:'exact'
) Which solver to use. “exact” uses the implementation described in van Dijk et al. (2018) [vanDijk18]. “approximate” uses a faster implementation that performs imputation in the PCA space and then projects back to the gene space. Note, the “approximate” solver may return negative values.
- knn_dist :
str
str
(default:'euclidean'
) recommended values: ‘euclidean’, ‘cosine’, ‘precomputed’ Any metric from
scipy.spatial.distance
can be used distance metric for building kNN graph. If ‘precomputed’,data
should be an n_samples x n_samples distance or affinity matrix.- random_state :
None
|int
|RandomState
Union
[None
,int
,RandomState
] (default:None
) Random seed. Defaults to the global
numpy
random number generator.- n_jobs :
int
|None
Optional
[int
] (default:None
) Number of threads to use in training. All cores are used by default.
- verbose :
bool
bool
(default:False
) If
True
or an integer>= 2
, print status messages. IfNone
,sc.settings.verbosity
is used.- copy :
bool
|None
Optional
[bool
] (default:None
) If true, a copy of anndata is returned. If
None
,copy
is True ifgenes
is not'all_genes'
or'pca_only'
.copy
may only be False ifgenes
is'all_genes'
or'pca_only'
, as the resultant data will otherwise have different column names from the input data.- kwargs
Additional arguments to
magic.MAGIC
.
- adata :
- Return type
- Returns
If
copy
is True, AnnData object is returned.If
subset_genes
is notall_genes
, PCA on MAGIC values of cells are stored inadata.obsm['X_magic']
andadata.X
is not modified.The raw counts are stored in
.raw
attribute of AnnData object.
Examples
>>> import scanpy as sc >>> import scanpy.external as sce >>> adata = sc.datasets.paul15() >>> sc.pp.normalize_per_cell(adata) >>> sc.pp.sqrt(adata) # or sc.pp.log1p(adata) >>> adata_magic = sce.pp.magic(adata, name_list=['Mpo', 'Klf1', 'Ifitm1'], knn=5) >>> adata_magic.shape (2730, 3) >>> sce.pp.magic(adata, name_list='pca_only', knn=5) >>> adata.obsm['X_magic'].shape (2730, 100) >>> sce.pp.magic(adata, name_list='all_genes', knn=5) >>> adata.X.shape (2730, 3451)