Single-Cell Analysis in Python.
API
Import Scanpy as:
import scanpy as sc
Note
Additional functionality is available in the broader ecosystem, with some tools being wrapped in the scanpy.external
module.
Preprocessing: pp
Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.
Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.
Basic Preprocessing
For visual quality control, see highest_expr_genes()
and
filter_genes_dispersion()
in scanpy.pl
.
|
Calculate quality control metrics. |
|
Filter cell outliers based on counts and numbers of genes expressed. |
|
Filter genes based on number of cells or counts. |
|
Annotate highly variable genes [Satija15] [Zheng17] [Stuart19]. |
|
Logarithmize the data matrix. |
|
Principal component analysis [Pedregosa11]. |
|
Normalize counts per cell. |
|
Regress out (mostly) unwanted sources of variation. |
|
Scale data to unit variance and zero mean. |
|
Subsample to a fraction of the number of observations. |
|
Downsample counts from count matrix. |
Recipes
|
Normalization and filtering as of [Zheng17]. |
|
Normalization and filtering as of [Weinreb17]. |
|
Normalization and filtering as of Seurat [Satija15]. |
Batch effect correction
Also see [Data integration]. Note that a simple batch correction method is available via pp.regress_out()
. Checkout scanpy.external
for more.
|
ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12]. |
Neighbors
|
Compute a neighborhood graph of observations [McInnes18]. |
Tools: tl
Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.
Embeddings
|
Principal component analysis [Pedregosa11]. |
|
t-SNE [Maaten08] [Amir13] [Pedregosa11]. |
|
Embed the neighborhood graph using UMAP [McInnes18]. |
|
Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18]. |
|
Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18]. |
Compute densities on embeddings.
|
Calculate the density of cells in an embedding (per condition). |
Clustering and trajectory inference
|
Cluster cells into subgroups [Traag18]. |
|
Cluster cells into subgroups [Blondel08] [Levine15] [Traag17]. |
|
Computes a hierarchical clustering for the given |
|
Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19]. |
|
Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19]. |
Data integration
|
Map labels and embeddings from reference data to new data. |
Marker genes
|
Rank genes for characterizing groups. |
|
Filters out genes based on log fold change and fraction of genes expressing the gene within and outside the |
|
Calculate an overlap score between data-deriven marker genes and provided markers |
Gene scores, Cell cycle
|
Score a set of genes [Satija15]. |
|
Score cell cycle genes [Satija15]. |
Simulations
|
Simulate dynamic gene expression data [Wittmann09] [Wolf18]. |
Plotting: pl
The plotting module scanpy.pl
largely parallels the tl.*
and a few of the pp.*
functions.
For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.
See → tutorial: plotting/core for an overview of how to use these functions.
Note
See the Settings section for all important plotting configurations.
Generic
|
Scatter plot along observations or variables axes. |
|
Heatmap of the expression values of genes. |
|
Makes a dot plot of the expression values of |
|
In this type of plot each var_name is plotted as a filled line plot where the y values correspond to the var_name values and x is each of the cells. |
|
Violin plot. |
|
Stacked violin plots. |
|
Creates a heatmap of the mean expression values per group of each var_names. |
|
Hierarchically-clustered heatmap. |
|
Plot rankings. |
|
Plots a dendrogram of the categories defined in |
Classes
These classes allow fine tuning of visual parameters.
|
Allows the visualization of two values that are encoded as dot size and color. |
|
Allows the visualization of values using a color map. |
|
Stacked violin plots. |
Preprocessing
Methods for visualizing quality control and results of preprocessing functions.
|
Fraction of counts assigned to each gene over all cells. |
|
Plot dispersions versus means for genes. |
|
Plot dispersions or normalized variance versus means for genes. |
Tools
Methods that extract and visualize tool-specific annotation in an
AnnData
object. For any method in module tl
, there is
a method with the same name in pl
.
PCA
|
Scatter plot in PCA coordinates. |
|
Rank genes according to contributions to PCs. |
|
Plot the variance ratio. |
|
Plot PCA results. |
Embeddings
|
Scatter plot in tSNE basis. |
|
Scatter plot in UMAP basis. |
|
Scatter plot in Diffusion Map basis. |
|
Scatter plot in graph-drawing basis. |
|
Scatter plot in spatial coordinates. |
|
Scatter plot for user specified embedding basis (e.g. umap, pca, etc). |
Compute densities on embeddings.
|
Plot the density of cells in an embedding (per condition). |
Branching trajectories and pseudotime, clustering
Visualize clusters using one of the embedding methods passing color='louvain'
.
|
Plot groups and pseudotime. |
|
Heatmap of pseudotime series. |
|
Plot the PAGA graph through thresholding low-connectivity edges. |
|
Gene expression and annotation changes along paths in the abstracted graph. |
|
Scatter and PAGA graph side-by-side. |
Marker genes
|
Plot ranking of genes. |
|
Plot ranking of genes for all tested comparisons. |
|
Plot ranking of genes using stacked_violin plot (see |
|
Plot ranking of genes using heatmap plot (see |
|
Plot ranking of genes using dotplot plot (see |
|
Plot ranking of genes using matrixplot plot (see |
|
Plot ranking of genes using heatmap plot (see |
Simulations
|
Plot results of simulation. |
Reading
Note
For reading annotation use pandas.read_…
and add it to your anndata.AnnData
object. The following read functions are
intended for the numeric data in the data matrix X
.
Read common file formats using
|
Read file and return |
Read 10x formatted hdf5 files and directories containing .mtx
files using
|
Read 10x-Genomics-formatted hdf5 file. |
|
Read 10x-Genomics-formatted mtx directory. |
|
Read 10x-Genomics-formatted visum dataset. |
Read other formats using functions borrowed from anndata
|
Read |
|
Read |
|
Read |
|
Read |
|
Read |
|
Read |
|
Read |
|
Read a gzipped condensed count matrix from umi_tools. |
Get object from AnnData
: get
The module sc.get
provides convenience functions for getting values back in
useful formats.
|
Return values for observations in adata. |
|
Return values for observations in adata. |
|
|
Queries
This module provides useful queries for annotation and enrichment.
|
Retrieve gene annotations from ensembl biomart. |
|
Retrieve gene coordinates for specific organism through BioMart. |
|
Mitochondrial gene symbols for specific organism through BioMart. |
|
Get enrichment for DE results. |
Metrics
Collections of useful measurements for evaluating results.
|
Given an original and new set of labels, create a labelled confusion matrix. |
|
|
|
Calculate Moran’s I Global Autocorrelation Statistic. |
Experimental
New methods that are in early development which are not (yet) integrated in Scanpy core.
Applies analytic Pearson residual normalization, based on [Lause21]. |
|
Applies analytic Pearson residual normalization and PCA, based on [Lause21]. |
|
|
Select highly variable genes using analytic Pearson residuals [Lause21]. |
Full pipeline for HVG selection and normalization by analytic Pearson residuals ([Lause21]). |
Classes
AnnData
is reexported from anndata
.
Represent data as a neighborhood structure, usually a knn graph.
|
Data represented as graph of nearest neighbors. |
Settings
A convenience function for setting some default matplotlib.rcParams
and a
high-resolution jupyter display backend useful for use in notebooks.
|
Set resolution/size, styling and format of figures. |
An instance of the ScanpyConfig
is available as scanpy.settings
and allows configuring Scanpy.
|
Config manager for scanpy. |
Some selected settings are discussed in the following.
Influence the global behavior of plotting functions. In non-interactive scripts,
you’d usually want to set settings.autoshow
to False
.
Automatically show figures if |
|
Automatically save figures in |
The default directories for saving figures, caching files and storing datasets.
Directory for saving figures (default |
|
Directory for cache files (default |
|
Directory for example |
The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.
Verbosity level (default |
Print versions of packages that might influence numerical results.
|
Versions that might influence the numerical results. |
|
Print versions of imported packages, OS, and jupyter environment. |
Datasets
|
Gaussian Blobs. |
|
Load a dataset from the EBI Single Cell Expression Atlas |
Simulated myeloid progenitors [Krumsiek11]. |
|
Hematopoiesis in early mouse embryos [Moignard15]. |
|
3k PBMCs from 10x Genomics. |
|
Processed 3k PBMCs from 10x Genomics. |
|
Subsampled and processed 68k PBMCs. |
|
Development of Myeloid Progenitors [Paul15]. |
|
Simulated toggleswitch. |
|
|
Processed Visium Spatial Gene Expression data from 10x Genomics. |
Deprecated functions
|
Extract highly variable genes [Satija15] [Zheng17]. |
|
Normalize total counts per cell. |