API

Import Scanpy as:

import scanpy as sc

Note

Wrappers to external functionality are found in scanpy.external. Previously, both core and external functionality were available through scanpy.api (deprecated since 1.3.7).

Preprocessing: pp

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing

For visual quality control, see highest_expr_gens() and filter_genes_dispersion() in scanpy.plotting.

pp.calculate_qc_metrics(adata, *[, …])

Calculate quality control metrics.

pp.filter_cells(data[, min_counts, …])

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_genes(data[, min_counts, …])

Filter genes based on number of cells or counts.

pp.highly_variable_genes(adata[, min_disp, …])

Annotate highly variable genes [Satija15] [Zheng17].

pp.log1p(data[, copy, chunked, chunk_size])

Logarithmize the data matrix.

pp.pca(data[, n_comps, zero_center, …])

Principal component analysis [Pedregosa11].

pp.normalize_total(adata[, target_sum, …])

Normalize counts per cell.

pp.regress_out(adata, keys[, n_jobs, copy])

Regress out unwanted sources of variation.

pp.scale(data[, zero_center, max_value, copy])

Scale data to unit variance and zero mean.

pp.subsample(data[, fraction, n_obs, …])

Subsample to a fraction of the number of observations.

pp.downsample_counts(adata[, …])

Downsample counts from count matrix.

Recipes

pp.recipe_zheng17(adata[, n_top_genes, log, …])

Normalization and filtering as of [Zheng17].

pp.recipe_weinreb17(adata[, log, …])

Normalization and filtering as of [Weinreb17].

pp.recipe_seurat(adata[, log, plot, copy])

Normalization and filtering as of Seurat [Satija15].

Batch effect correction

Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.

pp.combat(adata[, key, covariates, inplace])

ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12].

Neighbors

pp.neighbors(adata[, n_neighbors, n_pcs, …])

Compute a neighborhood graph of observations [McInnes18].

Tools: tl

Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.

Embeddings

tl.pca(data[, n_comps, zero_center, …])

Principal component analysis [Pedregosa11].

tl.tsne(adata[, n_pcs, use_rep, perplexity, …])

t-SNE [Maaten08] [Amir13] [Pedregosa11].

tl.umap(adata[, min_dist, spread, …])

Embed the neighborhood graph using UMAP [McInnes18].

tl.draw_graph(adata[, layout, init_pos, …])

Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].

tl.diffmap(adata[, n_comps, copy])

Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].

Clustering and trajectory inference

tl.leiden(adata[, resolution, restrict_to, …])

Cluster cells into subgroups [Traag18].

tl.louvain(adata[, resolution, …])

Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].

tl.dendrogram(adata, groupby[, n_pcs, …])

Computes a hierarchical clustering for the given groupby categories.

tl.dpt(adata[, n_dcs, n_branchings, …])

Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19].

tl.paga(adata[, groups, use_rna_velocity, …])

Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19].

Marker genes

tl.rank_genes_groups(adata, groupby[, …])

Rank genes for characterizing groups.

tl.filter_rank_genes_groups(adata[, key, …])

Filters out genes based on fold change and fraction of genes expressing the gene within and outside the groupby categories.

tl.marker_gene_overlap(adata, …[, key, …])

Calculate an overlap score between data-deriven marker genes and provided markers

Gene scores, Cell cycle

tl.score_genes(adata, gene_list[, …])

Score a set of genes [Satija15].

tl.score_genes_cell_cycle(adata, s_genes, …)

Score cell cycle genes [Satija15].

Simulations

tl.sim(model[, params_file, tmax, …])

Simulate dynamic gene expression data [Wittmann09] [Wolf18].

Plotting: pl

The plotting module scanpy.plotting largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.

plotting

Plotting API

Reading

Note

For reading annotation use pandas.read_… and add it to your anndata.AnnData object. The following read functions are intended for the numeric data in the data matrix X.

Read common file formats using

read(filename[, backed, sheet, ext, …])

Read file and return AnnData object.

Read 10x formatted hdf5 files and directories containing .mtx files using

read_10x_h5(filename[, genome, gex_only])

Read 10x-Genomics-formatted hdf5 file.

read_10x_mtx(path[, var_names, make_unique, …])

Read 10x-Genomics-formatted mtx directory.

Read other formats using functions borrowed from anndata

read_h5ad(filename[, backed, chunk_size])

Read .h5ad-formatted hdf5 file.

read_csv(filename[, delimiter, …])

Read .csv file.

read_excel(filename, sheet[, dtype])

Read .xlsx (Excel) file.

read_hdf(filename, key)

Read .h5 (hdf5) file.

read_loom(filename[, sparse, cleanup, …])

Read .loom-formatted hdf5 file.

read_mtx(filename[, dtype])

Read .mtx file.

read_text(filename[, delimiter, …])

Read .txt, .tab, .data (text) file.

read_umi_tools(filename[, dtype])

Read a gzipped condensed count matrix from umi_tools.

Get object from AnnData: get

The module sc.get provides convenience functions for getting values back in useful formats.

get.obs_df(adata[, keys, obsm_keys, layer, …])

Return values for observations in adata.

get.var_df(adata[, keys, varm_keys, layer])

Return values for observations in adata.

get.rank_genes_groups_df(adata, group, *[, …])

scanpy.tl.rank_genes_groups() results in the form of a pd.DataFrame.

Queries

queries.mitochondrial_genes(host, org)

Mitochondrial gene symbols for specific organism through BioMart.

Classes

AnnData is reexported from anndata.

Represent data as a neighborhood structure, usually a knn graph.

Neighbors(adata[, n_dcs])

Data represented as graph of nearest neighbors.

Settings

A convenience function for setting some default matplotlib.rcParams and a high-resolution jupyter display backend useful for use in notebooks.

set_figure_params([scanpy, dpi, dpi_save, …])

Set resolution/size, styling and format of figures.

An instance of the ScanpyConfig is available as scanpy.settings and allows configuring Scanpy.

_settings.ScanpyConfig(*[, verbosity, …])

Config manager for scanpy.

Some selected settings are discussed in the following.

Influence the global behavior of plotting functions. In non-interactive scripts, you’d usually want to set settings.autoshow to False.

settings.autoshow

Automatically show figures (default: True).

settings.autosave

Automatically save figures (default: False).

The default directories for saving figures, caching files and storing datasets.

settings.figdir

Directory for saving figures (default: './figures/').

settings.cachedir

Directory for cache files (default: './cache/').

settings.datasetdir

Directory for example datasets (default: './data/').

The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.

settings.verbosity

Verbosity level (default: 1).

Print versions of packages that might influence numerical results.

logging.print_versions()

Versions that might influence the numerical results.

Datasets

datasets.blobs([n_variables, n_centers, …])

Gaussian Blobs.

datasets.ebi_expression_atlas(accession, *)

Load a dataset from the EBI Single Cell Expression Atlas.

datasets.krumsiek11()

Simulated myeloid progenitors [Krumsiek11].

datasets.moignard15()

Hematopoiesis in early mouse embryos [Moignard15].

datasets.pbmc3k()

3k PBMCs from 10x Genomics.

datasets.pbmc68k_reduced()

Subsampled and processed 68k PBMCs.

datasets.paul15()

Development of Myeloid Progenitors [Paul15].

datasets.toggleswitch()

Simulated toggleswitch.

Further modules

api

Global API (deprecated)

plotting

Plotting API

Deprecated functions

pp.filter_genes_dispersion(data[, flavor, …])

Extract highly variable genes [Satija15] [Zheng17].

pp.normalize_per_cell(data[, …])

Normalize total counts per cell.