Single-Cell Analysis in Python.


Import Scanpy as:

import scanpy as sc


Wrappers to external functionality are found in scanpy.external.

Preprocessing: pp

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing

For visual quality control, see highest_expr_genes() and filter_genes_dispersion() in scanpy.plotting.

pp.calculate_qc_metrics(adata, *[, …])

Calculate quality control metrics.

pp.filter_cells(data[, min_counts, …])

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_genes(data[, min_counts, …])

Filter genes based on number of cells or counts.

pp.highly_variable_genes(adata[, layer, …])

Annotate highly variable genes [Satija15] [Zheng17] [Stuart19].

pp.log1p(X, *[, base, copy, chunked, …])

Logarithmize the data matrix.

pp.pca(data[, n_comps, zero_center, …])

Principal component analysis [Pedregosa11].

pp.normalize_total(adata[, target_sum, …])

Normalize counts per cell.

pp.regress_out(adata, keys[, n_jobs, copy])

Regress out (mostly) unwanted sources of variation.

pp.scale(X[, zero_center, max_value, copy, …])

Scale data to unit variance and zero mean.

pp.subsample(data[, fraction, n_obs, …])

Subsample to a fraction of the number of observations.

pp.downsample_counts(adata[, …])

Downsample counts from count matrix.


pp.recipe_zheng17(adata[, n_top_genes, log, …])

Normalization and filtering as of [Zheng17].

pp.recipe_weinreb17(adata[, log, …])

Normalization and filtering as of [Weinreb17].

pp.recipe_seurat(adata[, log, plot, copy])

Normalization and filtering as of Seurat [Satija15].

Batch effect correction

Also see Data integration. Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.

pp.combat(adata[, key, covariates, inplace])

ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12].


pp.neighbors(adata[, n_neighbors, n_pcs, …])

Compute a neighborhood graph of observations [McInnes18].

Tools: tl

Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.


tl.pca(data[, n_comps, zero_center, …])

Principal component analysis [Pedregosa11].

tl.tsne(adata[, n_pcs, use_rep, perplexity, …])

t-SNE [Maaten08] [Amir13] [Pedregosa11].

tl.umap(adata[, min_dist, spread, …])

Embed the neighborhood graph using UMAP [McInnes18].

tl.draw_graph(adata[, layout, init_pos, …])

Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].

tl.diffmap(adata[, n_comps, neighbors_key, copy])

Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].

Compute densities on embeddings.

tl.embedding_density(adata[, basis, …])

Calculate the density of cells in an embedding (per condition).

Clustering and trajectory inference

tl.leiden(adata[, resolution, restrict_to, …])

Cluster cells into subgroups [Traag18].

tl.louvain(adata[, resolution, …])

Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].

tl.dendrogram(adata, groupby[, n_pcs, …])

Computes a hierarchical clustering for the given groupby categories.

tl.dpt(adata[, n_dcs, n_branchings, …])

Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19].

tl.paga(adata[, groups, use_rna_velocity, …])

Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19].

Data integration

tl.ingest(adata, adata_ref[, obs, …])

Map labels and embeddings from reference data to new data.

Marker genes

tl.rank_genes_groups(adata, groupby[, …])

Rank genes for characterizing groups.

tl.filter_rank_genes_groups(adata[, key, …])

Filters out genes based on fold change and fraction of genes expressing the gene within and outside the groupby categories.

tl.marker_gene_overlap(adata, …[, key, …])

Calculate an overlap score between data-deriven marker genes and provided markers

Gene scores, Cell cycle

tl.score_genes(adata, gene_list[, …])

Score a set of genes [Satija15].

tl.score_genes_cell_cycle(adata, s_genes, …)

Score cell cycle genes [Satija15].


tl.sim(model[, params_file, tmax, …])

Simulate dynamic gene expression data [Wittmann09] [Wolf18].

Plotting: pl

The plotting module scanpy.plotting largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.


Plotting API



For reading annotation use pandas.read_… and add it to your anndata.AnnData object. The following read functions are intended for the numeric data in the data matrix X.

Read common file formats using

read(filename[, backed, sheet, ext, …])

Read file and return AnnData object.

Read 10x formatted hdf5 files and directories containing .mtx files using

read_10x_h5(filename[, genome, gex_only, …])

Read 10x-Genomics-formatted hdf5 file.

read_10x_mtx(path[, var_names, make_unique, …])

Read 10x-Genomics-formatted mtx directory.

read_visium(path[, genome, count_file, …])

Read 10x-Genomics-formatted visum dataset.

Read other formats using functions borrowed from anndata

read_h5ad(filename[, backed, as_sparse, …])

Read .h5ad-formatted hdf5 file.

read_csv(filename[, delimiter, …])

Read .csv file.

read_excel(filename, sheet[, dtype])

Read .xlsx (Excel) file.

read_hdf(filename, key)

Read .h5 (hdf5) file.

read_loom(filename[, sparse, cleanup, …])

Read .loom-formatted hdf5 file.

read_mtx(filename[, dtype])

Read .mtx file.

read_text(filename[, delimiter, …])

Read .txt, .tab, .data (text) file.

read_umi_tools(filename[, dtype])

Read a gzipped condensed count matrix from umi_tools.

Get object from AnnData: get

The module sc.get provides convenience functions for getting values back in useful formats.

get.obs_df(adata[, keys, obsm_keys, layer, …])

Return values for observations in adata.

get.var_df(adata[, keys, varm_keys, layer])

Return values for observations in adata.

get.rank_genes_groups_df(adata, group, *[, …]) results in the form of a DataFrame.


This module provides useful queries for annotation and enrichment.

queries.biomart_annotations(org, attrs, *[, …])

Retrieve gene annotations from ensembl biomart.

queries.gene_coordinates(org, gene_name, *)

Retrieve gene coordinates for specific organism through BioMart.

queries.mitochondrial_genes(org, *[, …])

Mitochondrial gene symbols for specific organism through BioMart.

queries.enrich(container, *[, org, …])

Get enrichment for DE results.


AnnData is reexported from anndata.

Represent data as a neighborhood structure, usually a knn graph.

Neighbors(adata[, n_dcs, neighbors_key])

Data represented as graph of nearest neighbors.


A convenience function for setting some default matplotlib.rcParams and a high-resolution jupyter display backend useful for use in notebooks.

set_figure_params([scanpy, dpi, dpi_save, …])

Set resolution/size, styling and format of figures.

An instance of the ScanpyConfig is available as scanpy.settings and allows configuring Scanpy.

_settings.ScanpyConfig(*[, verbosity, …])

Config manager for scanpy.

Some selected settings are discussed in the following.

Influence the global behavior of plotting functions. In non-interactive scripts, you’d usually want to set settings.autoshow to False.


Automatically show figures if autosave == False (default True).


Automatically save figures in figdir (default False).

The default directories for saving figures, caching files and storing datasets.


Directory for saving figures (default './figures/').


Directory for cache files (default './cache/').


Directory for example datasets (default './data/').

The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.


Verbosity level (default warning)

Print versions of packages that might influence numerical results.

logging.print_header(*[, file])

Versions that might influence the numerical results.

logging.print_versions(*[, file])

Print print versions of imported packages


datasets.blobs([n_variables, n_centers, …])

Gaussian Blobs.

datasets.ebi_expression_atlas(accession, *)

Load a dataset from the EBI Single Cell Expression Atlas


Simulated myeloid progenitors [Krumsiek11].


Hematopoiesis in early mouse embryos [Moignard15].


3k PBMCs from 10x Genomics.


Processed 3k PBMCs from 10x Genomics.


Subsampled and processed 68k PBMCs.


Development of Myeloid Progenitors [Paul15].


Simulated toggleswitch.


Processed Visium Spatial Gene Expression data from 10x Genomics.

Further modules


Plotting API

Deprecated functions

pp.filter_genes_dispersion(data[, flavor, …])

Extract highly variable genes [Satija15] [Zheng17].

pp.normalize_per_cell(data[, …])

Normalize total counts per cell.