Single-Cell Analysis in Python.
API
Import Scanpy as:
import scanpy as sc
Note
Additional functionality is available in the broader ecosystem, with some tools being wrapped in the scanpy.external module.
Preprocessing: pp
Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.
Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.
Basic Preprocessing
For visual quality control, see highest_expr_genes() and
filter_genes_dispersion() in scanpy.pl.
  | 
Calculate quality control metrics.  | 
  | 
Filter cell outliers based on counts and numbers of genes expressed.  | 
  | 
Filter genes based on number of cells or counts.  | 
  | 
Annotate highly variable genes [Satija15] [Zheng17] [Stuart19].  | 
  | 
Logarithmize the data matrix.  | 
  | 
Principal component analysis [Pedregosa11].  | 
  | 
Normalize counts per cell.  | 
  | 
Regress out (mostly) unwanted sources of variation.  | 
  | 
Scale data to unit variance and zero mean.  | 
  | 
Subsample to a fraction of the number of observations.  | 
  | 
Downsample counts from count matrix.  | 
Recipes
  | 
Normalization and filtering as of [Zheng17].  | 
  | 
Normalization and filtering as of [Weinreb17].  | 
  | 
Normalization and filtering as of Seurat [Satija15].  | 
Batch effect correction
Also see [Data integration]. Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.
  | 
ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12].  | 
Neighbors
  | 
Compute a neighborhood graph of observations [McInnes18].  | 
Tools: tl
Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.
Embeddings
  | 
Principal component analysis [Pedregosa11].  | 
  | 
t-SNE [Maaten08] [Amir13] [Pedregosa11].  | 
  | 
Embed the neighborhood graph using UMAP [McInnes18].  | 
  | 
Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].  | 
  | 
Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].  | 
Compute densities on embeddings.
  | 
Calculate the density of cells in an embedding (per condition).  | 
Clustering and trajectory inference
  | 
Cluster cells into subgroups [Traag18].  | 
  | 
Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].  | 
  | 
Computes a hierarchical clustering for the given   | 
  | 
Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19].  | 
  | 
Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19].  | 
Data integration
  | 
Map labels and embeddings from reference data to new data.  | 
Marker genes
  | 
Rank genes for characterizing groups.  | 
  | 
Filters out genes based on log fold change and fraction of genes expressing the gene within and outside the   | 
  | 
Calculate an overlap score between data-deriven marker genes and provided markers  | 
Gene scores, Cell cycle
  | 
Score a set of genes [Satija15].  | 
  | 
Score cell cycle genes [Satija15].  | 
Simulations
  | 
Simulate dynamic gene expression data [Wittmann09] [Wolf18].  | 
Plotting: pl
The plotting module scanpy.pl largely parallels the tl.* and a few of the pp.* functions.
For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.
See → tutorial: plotting/core for an overview of how to use these functions.
Note
See the Settings section for all important plotting configurations.
Generic
  | 
Scatter plot along observations or variables axes.  | 
  | 
Heatmap of the expression values of genes.  | 
  | 
Makes a dot plot of the expression values of   | 
  | 
In this type of plot each var_name is plotted as a filled line plot where the y values correspond to the var_name values and x is each of the cells.  | 
  | 
Violin plot.  | 
  | 
Stacked violin plots.  | 
  | 
Creates a heatmap of the mean expression values per group of each var_names.  | 
  | 
Hierarchically-clustered heatmap.  | 
  | 
Plot rankings.  | 
  | 
Plots a dendrogram of the categories defined in   | 
Classes
These classes allow fine tuning of visual parameters.
  | 
Allows the visualization of two values that are encoded as dot size and color.  | 
  | 
Allows the visualization of values using a color map.  | 
  | 
Stacked violin plots.  | 
Preprocessing
Methods for visualizing quality control and results of preprocessing functions.
  | 
Fraction of counts assigned to each gene over all cells.  | 
  | 
Plot dispersions versus means for genes.  | 
  | 
Plot dispersions or normalized variance versus means for genes.  | 
Tools
Methods that extract and visualize tool-specific annotation in an
AnnData object.  For any method in module tl, there is
a method with the same name in pl.
PCA
  | 
Scatter plot in PCA coordinates.  | 
  | 
Rank genes according to contributions to PCs.  | 
  | 
Plot the variance ratio.  | 
  | 
Plot PCA results.  | 
Embeddings
  | 
Scatter plot in tSNE basis.  | 
  | 
Scatter plot in UMAP basis.  | 
  | 
Scatter plot in Diffusion Map basis.  | 
  | 
Scatter plot in graph-drawing basis.  | 
  | 
Scatter plot in spatial coordinates.  | 
  | 
Scatter plot for user specified embedding basis (e.g. umap, pca, etc).  | 
Compute densities on embeddings.
  | 
Plot the density of cells in an embedding (per condition).  | 
Branching trajectories and pseudotime, clustering
Visualize clusters using one of the embedding methods passing color='louvain'.
  | 
Plot groups and pseudotime.  | 
  | 
Heatmap of pseudotime series.  | 
  | 
Plot the PAGA graph through thresholding low-connectivity edges.  | 
  | 
Gene expression and annotation changes along paths in the abstracted graph.  | 
  | 
Scatter and PAGA graph side-by-side.  | 
Marker genes
  | 
Plot ranking of genes.  | 
  | 
Plot ranking of genes for all tested comparisons.  | 
  | 
Plot ranking of genes using stacked_violin plot (see   | 
  | 
Plot ranking of genes using heatmap plot (see   | 
  | 
Plot ranking of genes using dotplot plot (see   | 
  | 
Plot ranking of genes using matrixplot plot (see   | 
  | 
Plot ranking of genes using heatmap plot (see   | 
Simulations
  | 
Plot results of simulation.  | 
Reading
Note
For reading annotation use pandas.read_…
and add it to your anndata.AnnData object. The following read functions are
intended for the numeric data in the data matrix X.
Read common file formats using
  | 
Read file and return   | 
Read 10x formatted hdf5 files and directories containing .mtx files using
  | 
Read 10x-Genomics-formatted hdf5 file.  | 
  | 
Read 10x-Genomics-formatted mtx directory.  | 
  | 
Read 10x-Genomics-formatted visum dataset.  | 
Read other formats using functions borrowed from anndata
  | 
Read   | 
  | 
Read   | 
  | 
Read   | 
  | 
Read   | 
  | 
Read   | 
  | 
Read   | 
  | 
Read   | 
  | 
Read a gzipped condensed count matrix from umi_tools.  | 
Get object from AnnData: get
The module sc.get provides convenience functions for getting values back in
useful formats.
  | 
Return values for observations in adata.  | 
  | 
Return values for observations in adata.  | 
  | 
  | 
Queries
This module provides useful queries for annotation and enrichment.
  | 
Retrieve gene annotations from ensembl biomart.  | 
  | 
Retrieve gene coordinates for specific organism through BioMart.  | 
  | 
Mitochondrial gene symbols for specific organism through BioMart.  | 
  | 
Get enrichment for DE results.  | 
Metrics
Collections of useful measurements for evaluating results.
  | 
Given an original and new set of labels, create a labelled confusion matrix.  | 
  | 
|
  | 
Calculate Moran’s I Global Autocorrelation Statistic.  | 
Experimental
New methods that are in early development which are not (yet) integrated in Scanpy core.
Applies analytic Pearson residual normalization, based on [Lause21].  | 
|
Applies analytic Pearson residual normalization and PCA, based on [Lause21].  | 
|
  | 
Select highly variable genes using analytic Pearson residuals [Lause21].  | 
Full pipeline for HVG selection and normalization by analytic Pearson residuals ([Lause21]).  | 
Classes
AnnData is reexported from anndata.
Represent data as a neighborhood structure, usually a knn graph.
  | 
Data represented as graph of nearest neighbors.  | 
Settings
A convenience function for setting some default matplotlib.rcParams and a
high-resolution jupyter display backend useful for use in notebooks.
  | 
Set resolution/size, styling and format of figures.  | 
An instance of the ScanpyConfig is available as scanpy.settings and allows configuring Scanpy.
  | 
Config manager for scanpy.  | 
Some selected settings are discussed in the following.
Influence the global behavior of plotting functions. In non-interactive scripts,
you’d usually want to set settings.autoshow to False.
Automatically show figures if   | 
|
Automatically save figures in   | 
The default directories for saving figures, caching files and storing datasets.
Directory for saving figures (default   | 
|
Directory for cache files (default   | 
|
Directory for example   | 
The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.
Verbosity level (default   | 
Print versions of packages that might influence numerical results.
  | 
Versions that might influence the numerical results.  | 
  | 
Print versions of imported packages, OS, and jupyter environment.  | 
Datasets
  | 
Gaussian Blobs.  | 
  | 
Load a dataset from the EBI Single Cell Expression Atlas  | 
Simulated myeloid progenitors [Krumsiek11].  | 
|
Hematopoiesis in early mouse embryos [Moignard15].  | 
|
3k PBMCs from 10x Genomics.  | 
|
Processed 3k PBMCs from 10x Genomics.  | 
|
Subsampled and processed 68k PBMCs.  | 
|
Development of Myeloid Progenitors [Paul15].  | 
|
Simulated toggleswitch.  | 
|
  | 
Processed Visium Spatial Gene Expression data from 10x Genomics.  | 
Deprecated functions
  | 
Extract highly variable genes [Satija15] [Zheng17].  | 
  | 
Normalize total counts per cell.  |