Import Scanpy as:

import scanpy as sc


Wrappers to external functionality are found in scanpy.external. Previously, both core and external functionality were available through scanpy.api (deprecated since 1.3.7).

Preprocessing: PP

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing

For visual quality control, see highest_expr_gens() and filter_genes_dispersion() in scanpy.plotting.

pp.calculate_qc_metrics(adata[, expr_type, …]) Calculate quality control metrics.
pp.filter_cells(data[, min_counts, …]) Filter cell outliers based on counts and numbers of genes expressed.
pp.filter_genes(data[, min_counts, …]) Filter genes based on number of cells or counts.
pp.highly_variable_genes(adata[, min_disp, …]) Annotate highly variable genes [Satija15] [Zheng17].
pp.filter_genes_dispersion(data[, flavor, …]) Extract highly variable genes [Satija15] [Zheng17].
pp.log1p(data[, copy, chunked, chunk_size]) Logarithmize the data matrix.
pp.pca(data[, n_comps, zero_center, …]) Principal component analysis [Pedregosa11].
pp.normalize_per_cell(data[, …]) Normalize total counts per cell.
pp.regress_out(adata, keys[, n_jobs, copy]) Regress out unwanted sources of variation.
pp.scale(data[, zero_center, max_value, copy]) Scale data to unit variance and zero mean.
pp.subsample(data[, fraction, n_obs, …]) Subsample to a fraction of the number of observations.
pp.downsample_counts(adata[, target_counts, …]) Downsample counts so that each cell has no more than target_counts.


pp.recipe_zheng17(adata[, n_top_genes, log, …]) Normalization and filtering as of [Zheng17].
pp.recipe_weinreb17(adata[, log, …]) Normalization and filtering as of [Weinreb17].
pp.recipe_seurat(adata[, log, plot, copy]) Normalization and filtering as of Seurat [Satija15].

Batch effect correction

Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.


pp.neighbors(adata[, n_neighbors, n_pcs, …]) Compute a neighborhood graph of observations [McInnes18].

Tools: TL

Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.


tl.pca(data[, n_comps, zero_center, …]) Principal component analysis [Pedregosa11].
tl.tsne(adata[, n_pcs, use_rep, perplexity, …]) t-SNE [Maaten08] [Amir13] [Pedregosa11].
tl.umap(adata[, min_dist, spread, …]) Embed the neighborhood graph using UMAP [McInnes18].
tl.draw_graph(adata[, layout, init_pos, …]) Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].
tl.diffmap(adata[, n_comps, copy]) Diffusion Maps [Coifman05] [Haghverdi15] [Wolf17].

Clustering and trajectory inference

tl.leiden(adata[, resolution, random_state, …]) Cluster cells into subgroups [Traag18].
tl.louvain(adata[, resolution, …]) Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].
tl.dpt(adata[, n_dcs, n_branchings, …]) Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf17i].
tl.paga(adata[, groups, use_rna_velocity, …]) Generate cellular maps of differentiation manifolds with complex topologies [Wolf17i].

Marker genes

tl.rank_genes_groups(adata, groupby[, …]) Rank genes for characterizing groups.

Gene scores, Cell cycle

tl.score_genes(adata, gene_list[, …]) Score a set of genes [Satija15].
tl.score_genes_cell_cycle(adata, s_genes, …) Score cell cycle genes [Satija15].


tl.sim(model[, params_file, tmax, …]) Simulate dynamic gene expression data [Wittmann09] [Wolf17].

Plotting: PL

The plotting module scanpy.plotting largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.


Note: For reading annotation use pandas.read_… and add it to your anndata.AnnData object. The following read functions are intended for the numeric data in the data matrix X.

Read common file formats using

read(filename[, backed, sheet, ext, …]) Read file and return AnnData object.

Read 10x formatted hdf5 files and directories containing .mtx files using

read_10x_h5(filename[, genome, gex_only]) Read 10x-Genomics-formatted hdf5 file.
read_10x_mtx(path[, var_names, make_unique, …]) Read 10x-Genomics-formatted mtx directory.

Read other formats using functions borrowed from anndata

read_h5ad(filename[, backed, chunk_size]) Read .h5ad-formatted hdf5 file.
read_csv(filename[, delimiter, …]) Read .csv file.
read_excel(filename, sheet[, dtype]) Read .xlsx (Excel) file.
read_hdf(filename, key) Read .h5 (hdf5) file.
read_loom(filename[, sparse, cleanup, …]) Read .loom-formatted hdf5 file.
read_mtx(filename[, dtype]) Read .mtx file.
read_text(filename[, delimiter, …]) Read .txt, .tab, .data (text) file.
read_umi_tools(filename[, dtype]) Read a gzipped condensed count matrix from umi_tools.


queries.mitochondrial_genes(host, org) Mitochondrial gene symbols for specific organism through BioMart.


AnnData is reexported from anndata.

Represent data as a neighborhood structure, usually a knn graph.

Neighbors(adata[, n_dcs]) Data represented as graph of nearest neighbors.


A convenience function for setting some default matplotlib.rcParams and a high-resolution jupyter display backend useful for use in notebooks.

set_figure_params([scanpy, dpi, dpi_save, …]) Set resolution/size, styling and format of figures.

Influence the global behavior of plotting functions. In non-interactive scripts, you’d usually want to set settings.autoshow to False.

settings.autoshow Automatically show figures (default: True).
settings.autosave Automatically save figures (default: False).

The default directories for saving figures and caching files.

settings.figdir Directory for saving figures (default: './figures/').
settings.cachedir Directory for cache files (default: './cache/').

The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.

settings.verbosity Verbosity level (default: 1).

Print versions of packages that might influence numerical results.

logging.print_versions() Versions that might influence the numerical results.


datasets.blobs([n_variables, n_centers, …]) Gaussian Blobs.
datasets.krumsiek11() Simulated myeloid progenitors [Krumsiek11].
datasets.moignard15() Hematopoiesis in early mouse embryos [Moignard15].
datasets.pbmc3k() 3k PBMCs from 10x Genomics.
datasets.pbmc68k_reduced() Subsampled and processed 68k PBMCs.
datasets.paul15() Development of Myeloid Progenitors [Paul15].
datasets.toggleswitch() Simulated toggleswitch.

Further Modules

external External API
api Global API (deprecated)
plotting Plotting API