Single-Cell Analysis in Python.
API¶
Import Scanpy as:
import scanpy as sc
Note
Wrappers to external functionality are found in scanpy.external
.
Preprocessing: pp
¶
Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.
Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.
Basic Preprocessing¶
For visual quality control, see highest_expr_genes()
and
filter_genes_dispersion()
in scanpy.plotting
.
|
Calculate quality control metrics. |
|
Filter cell outliers based on counts and numbers of genes expressed. |
|
Filter genes based on number of cells or counts. |
|
Annotate highly variable genes [Satija15] [Zheng17] [Stuart19]. |
|
Logarithmize the data matrix. |
|
Principal component analysis [Pedregosa11]. |
|
Normalize counts per cell. |
|
Regress out (mostly) unwanted sources of variation. |
|
Scale data to unit variance and zero mean. |
|
Subsample to a fraction of the number of observations. |
|
Downsample counts from count matrix. |
Recipes¶
|
Normalization and filtering as of [Zheng17]. |
|
Normalization and filtering as of [Weinreb17]. |
|
Normalization and filtering as of Seurat [Satija15]. |
Batch effect correction¶
Also see Data integration. Note that a simple batch correction method is available via pp.regress_out()
. Checkout scanpy.external
for more.
|
ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12]. |
Neighbors¶
|
Compute a neighborhood graph of observations [McInnes18]. |
Tools: tl
¶
Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.
Embeddings¶
|
Principal component analysis [Pedregosa11]. |
|
t-SNE [Maaten08] [Amir13] [Pedregosa11]. |
|
Embed the neighborhood graph using UMAP [McInnes18]. |
|
Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18]. |
|
Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18]. |
Compute densities on embeddings.
|
Calculate the density of cells in an embedding (per condition). |
Clustering and trajectory inference¶
|
Cluster cells into subgroups [Traag18]. |
|
Cluster cells into subgroups [Blondel08] [Levine15] [Traag17]. |
|
Computes a hierarchical clustering for the given |
|
Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19]. |
|
Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19]. |
Data integration¶
|
Map labels and embeddings from reference data to new data. |
Marker genes¶
|
Rank genes for characterizing groups. |
|
Filters out genes based on fold change and fraction of genes expressing the gene within and outside the |
|
Calculate an overlap score between data-deriven marker genes and provided markers |
Gene scores, Cell cycle¶
|
Score a set of genes [Satija15]. |
|
Score cell cycle genes [Satija15]. |
Simulations¶
|
Simulate dynamic gene expression data [Wittmann09] [Wolf18]. |
Plotting: pl
¶
The plotting module scanpy.plotting
largely parallels the tl.*
and a few of the pp.*
functions.
For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.
Plotting API |
Reading¶
Note
For reading annotation use pandas.read_…
and add it to your anndata.AnnData
object. The following read functions are
intended for the numeric data in the data matrix X
.
Read common file formats using
|
Read file and return |
Read 10x formatted hdf5 files and directories containing .mtx
files using
|
Read 10x-Genomics-formatted hdf5 file. |
|
Read 10x-Genomics-formatted mtx directory. |
|
Read 10x-Genomics-formatted visum dataset. |
Read other formats using functions borrowed from anndata
|
Read |
|
Read |
|
Read |
|
Read |
|
Read |
|
Read |
|
Read |
|
Read a gzipped condensed count matrix from umi_tools. |
Get object from AnnData
: get
¶
The module sc.get
provides convenience functions for getting values back in
useful formats.
|
Return values for observations in adata. |
|
Return values for observations in adata. |
|
|
Queries¶
This module provides useful queries for annotation and enrichment.
|
Retrieve gene annotations from ensembl biomart. |
|
Retrieve gene coordinates for specific organism through BioMart. |
|
Mitochondrial gene symbols for specific organism through BioMart. |
|
Get enrichment for DE results. |
Classes¶
AnnData
is reexported from anndata
.
Represent data as a neighborhood structure, usually a knn graph.
|
Data represented as graph of nearest neighbors. |
Settings¶
A convenience function for setting some default matplotlib.rcParams
and a
high-resolution jupyter display backend useful for use in notebooks.
|
Set resolution/size, styling and format of figures. |
An instance of the ScanpyConfig
is available as scanpy.settings
and allows configuring Scanpy.
|
Config manager for scanpy. |
Some selected settings are discussed in the following.
Influence the global behavior of plotting functions. In non-interactive scripts,
you’d usually want to set settings.autoshow
to False
.
Automatically show figures if |
|
Automatically save figures in |
The default directories for saving figures, caching files and storing datasets.
Directory for saving figures (default |
|
Directory for cache files (default |
|
Directory for example |
The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.
Verbosity level (default |
Print versions of packages that might influence numerical results.
|
Versions that might influence the numerical results. |
|
Print print versions of imported packages |
Datasets¶
|
Gaussian Blobs. |
|
Load a dataset from the EBI Single Cell Expression Atlas |
Simulated myeloid progenitors [Krumsiek11]. |
|
Hematopoiesis in early mouse embryos [Moignard15]. |
|
3k PBMCs from 10x Genomics. |
|
Processed 3k PBMCs from 10x Genomics. |
|
Subsampled and processed 68k PBMCs. |
|
Development of Myeloid Progenitors [Paul15]. |
|
Simulated toggleswitch. |
|
|
Processed Visium Spatial Gene Expression data from 10x Genomics. |
Deprecated functions¶
|
Extract highly variable genes [Satija15] [Zheng17]. |
|
Normalize total counts per cell. |