Global API (deprecated)


Deprecated since version 1.3.7: Use the top level module instead: import scanpy as sc.

For the deprecated high-level API documented on this page, use import scanpy.api as sc.

Preprocessing: PP

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Basic Preprocessing

For visual quality control, see highest_expr_genes() and filter_genes_dispersion() in the plotting API.

pp.calculate_qc_metrics(adata, *[, …])

Calculate quality control metrics.

pp.filter_cells(data[, min_counts, …])

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_genes(data[, min_counts, …])

Filter genes based on number of cells or counts.

pp.highly_variable_genes(adata[, min_disp, …])

Annotate highly variable genes [Satija15] [Zheng17].

pp.filter_genes_dispersion(data[, flavor, …])

Extract highly variable genes [Satija15] [Zheng17].

pp.log1p(data[, copy, chunked, chunk_size, base])

Logarithmize the data matrix.

pp.pca(data[, n_comps, zero_center, …])

Principal component analysis [Pedregosa11].

pp.normalize_per_cell(data[, …])

Normalize total counts per cell.

pp.regress_out(adata, keys[, n_jobs, copy])

Regress out (mostly) unwanted sources of variation.

pp.scale(data[, zero_center, max_value, copy])

Scale data to unit variance and zero mean.

pp.subsample(data[, fraction, n_obs, …])

Subsample to a fraction of the number of observations.

pp.downsample_counts(adata[, …])

Downsample counts from count matrix.


pp.recipe_zheng17(adata[, n_top_genes, log, …])

Normalization and filtering as of [Zheng17].

pp.recipe_weinreb17(adata[, log, …])

Normalization and filtering as of [Weinreb17].

pp.recipe_seurat(adata[, log, plot, copy])

Normalization and filtering as of Seurat [Satija15].

Batch effect correction

Note that a simple batch correction method is available via scanpy.api.pp.regress_out().

pp.bbknn is just an alias for bbknn.bbknn(). Refer to it for the documentation.

pp.bbknn(adata[, batch_key, approx, metric, …])

Batch balanced kNN [Polanski19].

pp.mnn_correct(*datas[, var_index, …])

Correct batch effects by matching mutual nearest neighbors [Haghverdi18] [Kang18].


Note that the fundamental limitations of imputation are still under debate (issue 189)

pp.dca(adata[, mode, ae_type, …])

Deep count autoencoder [Eraslan18].

pp.magic(adata[, name_list, knn, decay, t, …])

Markov Affinity-based Graph Imputation of Cells (MAGIC) API [vanDijk18].


pp.neighbors(adata[, n_neighbors, n_pcs, …])

Compute a neighborhood graph of observations [McInnes18].

Tools: TL


tl.pca(data[, n_comps, zero_center, …])

Principal component analysis [Pedregosa11].

tl.tsne(adata[, n_pcs, use_rep, perplexity, …])

t-SNE [Maaten08] [Amir13] [Pedregosa11].

tl.umap(adata[, min_dist, spread, …])

Embed the neighborhood graph using UMAP [McInnes18].

tl.draw_graph(adata[, layout, init_pos, …])

Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].

tl.diffmap(adata[, n_comps, copy])

Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].

tl.phate(adata[, n_components, k, a, …])

PHATE [Moon17].

Clustering and trajectory inference

tl.leiden(adata[, resolution, restrict_to, …])

Cluster cells into subgroups [Traag18].

tl.louvain(adata[, resolution, …])

Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].

tl.dpt(adata[, n_dcs, n_branchings, …])

Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19].

tl.paga(adata[, groups, use_rna_velocity, …])

Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19].

Marker genes

tl.rank_genes_groups(adata, groupby[, …])

Rank genes for characterizing groups.

Gene scores, Cell cycle

tl.score_genes(adata, gene_list[, …])

Score a set of genes [Satija15].

tl.score_genes_cell_cycle(adata, s_genes, …)

Score cell cycle genes [Satija15].

tl.sandbag(adata[, annotation, fraction, …])

Calculate marker pairs of genes.

tl.cyclone(adata[, marker_pairs, …])

Assigns scores and predicted class to observations [Scialdone15] [Fechtner18].


tl.sim(model[, params_file, tmax, …])

Simulate dynamic gene expression data [Wittmann09] [Wolf18].

Plotting: PL

The plotting plotting API largely parallels the tl.* and pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.


Note: For reading annotation use pandas.read_… and add it to your AnnData object. The following read functions are intended for the numeric data in the data matrix X.

Read common file formats using

read(filename[, backed, sheet, ext, …])

Read file and return AnnData object.

Read 10x formatted hdf5 files and directories containing .mtx files using

read_10x_h5(filename[, genome, gex_only])

Read 10x-Genomics-formatted hdf5 file.

read_10x_mtx(path[, var_names, make_unique, …])

Read 10x-Genomics-formatted mtx directory.

Read other formats using functions borrowed from anndata

read_h5ad(filename[, backed, chunk_size])

Read .h5ad-formatted hdf5 file.

read_csv(filename[, delimiter, …])

Read .csv file.

read_excel(filename, sheet[, dtype])

Read .xlsx (Excel) file.

read_hdf(filename, key)

Read .h5 (hdf5) file.

read_loom(filename[, sparse, cleanup, …])

Read .loom-formatted hdf5 file.

read_mtx(filename[, dtype])

Read .mtx file.

read_text(filename[, delimiter, …])

Read .txt, .tab, .data (text) file.

read_umi_tools(filename[, dtype])

Read a gzipped condensed count matrix from umi_tools.


queries.mitochondrial_genes(org, *[, …])

Mitochondrial gene symbols for specific organism through BioMart.


AnnData is reexported from anndata.

Represent data as a neighborhood structure, usually a knn graph.

Neighbors(adata[, n_dcs])

Data represented as graph of nearest neighbors.


An instance of the ScanpyConfig is available as scanpy.settings and allows configuring Scanpy.

A convenience function for setting some default matplotlib.rcParams and a high-resolution jupyter display backend useful for use in notebooks.

set_figure_params([scanpy, dpi, dpi_save, …])

Set resolution/size, styling and format of figures.

Print versions of packages that might influence numerical results.


Versions that might influence the numerical results.


datasets.blobs([n_variables, n_centers, …])

Gaussian Blobs.


Simulated myeloid progenitors [Krumsiek11].


Hematopoiesis in early mouse embryos [Moignard15].


3k PBMCs from 10x Genomics.


Subsampled and processed 68k PBMCs.


Development of Myeloid Progenitors [Paul15].


Simulated toggleswitch.


export_to.spring_project(adata, project_dir, …)

Exports to a SPRING project directory [Weinreb17].