Single-Cell Analysis in Python.

API

Import Scanpy as:

import scanpy as sc

Note

Additional functionality is available in the broader ecosystem, with some tools being wrapped in the scanpy.external module.

Preprocessing: `pp`

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing

For visual quality control, see highest_expr_genes() and filter_genes_dispersion() in scanpy.pl.

`pp.calculate_qc_metrics`(adata, *[, ...])	Calculate quality control metrics.
`pp.filter_cells`(data[, min_counts, ...])	Filter cell outliers based on counts and numbers of genes expressed.
`pp.filter_genes`(data[, min_counts, ...])	Filter genes based on number of cells or counts.
`pp.highly_variable_genes`(adata[, layer, ...])	Annotate highly variable genes [Satija15] [Zheng17] [Stuart19].
`pp.log1p`(X, *[, base, copy, chunked, ...])	Logarithmize the data matrix.
`pp.pca`(data[, n_comps, zero_center, ...])	Principal component analysis [Pedregosa11].
`pp.normalize_total`(adata[, target_sum, ...])	Normalize counts per cell.
`pp.regress_out`(adata, keys[, n_jobs, copy])	Regress out (mostly) unwanted sources of variation.
`pp.scale`(X[, zero_center, max_value, copy, ...])	Scale data to unit variance and zero mean.
`pp.subsample`(data[, fraction, n_obs, ...])	Subsample to a fraction of the number of observations.
`pp.downsample_counts`(adata[, ...])	Downsample counts from count matrix.

Recipes

`pp.recipe_zheng17`(adata[, n_top_genes, log, ...])	Normalization and filtering as of [Zheng17].
`pp.recipe_weinreb17`(adata[, log, ...])	Normalization and filtering as of [Weinreb17].
`pp.recipe_seurat`(adata[, log, plot, copy])	Normalization and filtering as of Seurat [Satija15].

Batch effect correction

Also see [Data integration]. Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.

pp.combat(adata[, key, covariates, inplace])

ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12].

Neighbors

pp.neighbors(adata[, n_neighbors, n_pcs, ...])

Compute a neighborhood graph of observations [McInnes18].

Tools: `tl`

Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.

Embeddings

`tl.pca`(data[, n_comps, zero_center, ...])	Principal component analysis [Pedregosa11].
`tl.tsne`(adata[, n_pcs, use_rep, perplexity, ...])	t-SNE [Maaten08] [Amir13] [Pedregosa11].
`tl.umap`(adata[, min_dist, spread, ...])	Embed the neighborhood graph using UMAP [McInnes18].
`tl.draw_graph`(adata[, layout, init_pos, ...])	Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].
`tl.diffmap`(adata[, n_comps, neighbors_key, ...])	Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].

Compute densities on embeddings.

tl.embedding_density(adata[, basis, ...])

Calculate the density of cells in an embedding (per condition).

Clustering and trajectory inference

`tl.leiden`(adata[, resolution, restrict_to, ...])	Cluster cells into subgroups [Traag18].
`tl.louvain`(adata[, resolution, ...])	Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].
`tl.dendrogram`(adata, groupby[, n_pcs, ...])	Computes a hierarchical clustering for the given `groupby` categories.
`tl.dpt`(adata[, n_dcs, n_branchings, ...])	Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19].
`tl.paga`(adata[, groups, use_rna_velocity, ...])	Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19].

Data integration

tl.ingest(adata, adata_ref[, obs, ...])

Map labels and embeddings from reference data to new data.

Marker genes

`tl.rank_genes_groups`(adata, groupby[, ...])	Rank genes for characterizing groups.
`tl.filter_rank_genes_groups`(adata[, key, ...])	Filters out genes based on log fold change and fraction of genes expressing the gene within and outside the `groupby` categories.
`tl.marker_gene_overlap`(adata, ...[, key, ...])	Calculate an overlap score between data-deriven marker genes and provided markers

Gene scores, Cell cycle

`tl.score_genes`(adata, gene_list[, ...])	Score a set of genes [Satija15].
`tl.score_genes_cell_cycle`(adata, s_genes, ...)	Score cell cycle genes [Satija15].

Simulations

tl.sim(model[, params_file, tmax, ...])

Simulate dynamic gene expression data [Wittmann09] [Wolf18].

Plotting: `pl`

The plotting module scanpy.pl largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.

See → tutorial: plotting/core for an overview of how to use these functions.

Note

See the Settings section for all important plotting configurations.

Generic

`pl.scatter`(adata[, x, y, color, use_raw, ...])	Scatter plot along observations or variables axes.
`pl.heatmap`(adata, var_names, groupby[, ...])	Heatmap of the expression values of genes.
`pl.dotplot`(adata, var_names, groupby[, ...])	Makes a dot plot of the expression values of `var_names`.
`pl.tracksplot`(adata, var_names, groupby[, ...])	In this type of plot each var_name is plotted as a filled line plot where the y values correspond to the var_name values and x is each of the cells.
`pl.violin`(adata, keys[, groupby, log, ...])	Violin plot.
`pl.stacked_violin`(adata, var_names, groupby)	Stacked violin plots.
`pl.matrixplot`(adata, var_names, groupby[, ...])	Creates a heatmap of the mean expression values per group of each var_names.
`pl.clustermap`(adata[, obs_keys, use_raw, ...])	Hierarchically-clustered heatmap.
`pl.ranking`(adata, attr, keys[, dictionary, ...])	Plot rankings.
`pl.dendrogram`(adata, groupby, *[, ...])	Plots a dendrogram of the categories defined in `groupby`.

Classes

These classes allow fine tuning of visual parameters.

`pl.DotPlot`(adata, var_names, groupby[, ...])	Allows the visualization of two values that are encoded as dot size and color.
`pl.MatrixPlot`(adata, var_names, groupby[, ...])	Allows the visualization of values using a color map.
`pl.StackedViolin`(adata, var_names, groupby)	Stacked violin plots.

Preprocessing

Methods for visualizing quality control and results of preprocessing functions.

`pl.highest_expr_genes`(adata[, n_top, show, ...])	Fraction of counts assigned to each gene over all cells.
`pl.filter_genes_dispersion`(result[, log, ...])	Plot dispersions versus means for genes.
`pl.highly_variable_genes`(adata_or_result[, ...])	Plot dispersions or normalized variance versus means for genes.

Tools

Methods that extract and visualize tool-specific annotation in an AnnData object. For any method in module tl, there is a method with the same name in pl.

PCA

`pl.pca`(adata, *[, annotate_var_explained, ...])	Scatter plot in PCA coordinates.
`pl.pca_loadings`(adata[, components, ...])	Rank genes according to contributions to PCs.
`pl.pca_variance_ratio`(adata[, n_pcs, log, ...])	Plot the variance ratio.
`pl.pca_overview`(adata, **params)	Plot PCA results.

Embeddings

`pl.tsne`(adata, **kwargs)	Scatter plot in tSNE basis.
`pl.umap`(adata, **kwargs)	Scatter plot in UMAP basis.
`pl.diffmap`(adata, **kwargs)	Scatter plot in Diffusion Map basis.
`pl.draw_graph`(adata, *[, layout])	Scatter plot in graph-drawing basis.
`pl.spatial`(adata, *[, basis, img, img_key, ...])	Scatter plot in spatial coordinates.
`pl.embedding`(adata, basis, *[, color, ...])	Scatter plot for user specified embedding basis (e.g. umap, pca, etc).

Compute densities on embeddings.

pl.embedding_density(adata[, basis, key, ...])

Plot the density of cells in an embedding (per condition).

Branching trajectories and pseudotime, clustering

Visualize clusters using one of the embedding methods passing color='louvain'.

`pl.dpt_groups_pseudotime`(adata[, color_map, ...])	Plot groups and pseudotime.
`pl.dpt_timeseries`(adata[, color_map, show, ...])	Heatmap of pseudotime series.
`pl.paga`(adata[, threshold, color, layout, ...])	Plot the PAGA graph through thresholding low-connectivity edges.
`pl.paga_path`(adata, nodes, keys[, use_raw, ...])	Gene expression and annotation changes along paths in the abstracted graph.
`pl.paga_compare`(adata[, basis, edges, ...])	Scatter and PAGA graph side-by-side.

Marker genes

`pl.rank_genes_groups`(adata[, groups, ...])	Plot ranking of genes.
`pl.rank_genes_groups_violin`(adata[, groups, ...])	Plot ranking of genes for all tested comparisons.
`pl.rank_genes_groups_stacked_violin`(adata[, ...])	Plot ranking of genes using stacked_violin plot (see `stacked_violin()`)
`pl.rank_genes_groups_heatmap`(adata[, ...])	Plot ranking of genes using heatmap plot (see `heatmap()`)
`pl.rank_genes_groups_dotplot`(adata[, ...])	Plot ranking of genes using dotplot plot (see `dotplot()`)
`pl.rank_genes_groups_matrixplot`(adata[, ...])	Plot ranking of genes using matrixplot plot (see `matrixplot()`)
`pl.rank_genes_groups_tracksplot`(adata[, ...])	Plot ranking of genes using heatmap plot (see `heatmap()`)

Simulations

pl.sim(adata[, tmax_realization, ...])

Plot results of simulation.

Reading

Note

For reading annotation use pandas.read_… and add it to your anndata.AnnData object. The following read functions are intended for the numeric data in the data matrix X.

Read common file formats using

read(filename[, backed, sheet, ext, ...])

Read file and return AnnData object.

Read 10x formatted hdf5 files and directories containing .mtx files using

`read_10x_h5`(filename[, genome, gex_only, ...])	Read 10x-Genomics-formatted hdf5 file.
`read_10x_mtx`(path[, var_names, make_unique, ...])	Read 10x-Genomics-formatted mtx directory.
`read_visium`(path[, genome, count_file, ...])	Read 10x-Genomics-formatted visum dataset.

Read other formats using functions borrowed from anndata

`read_h5ad`(filename[, backed, as_sparse, ...])	Read `.h5ad`-formatted hdf5 file.
`read_csv`(filename[, delimiter, ...])	Read `.csv` file.
`read_excel`(filename, sheet[, dtype])	Read `.xlsx` (Excel) file.
`read_hdf`(filename, key)	Read `.h5` (hdf5) file.
`read_loom`(filename, *[, sparse, cleanup, ...])	Read `.loom`-formatted hdf5 file.
`read_mtx`(filename[, dtype])	Read `.mtx` file.
`read_text`(filename[, delimiter, ...])	Read `.txt`, `.tab`, `.data` (text) file.
`read_umi_tools`(filename[, dtype])	Read a gzipped condensed count matrix from umi_tools.

Get object from `AnnData`: `get`

The module sc.get provides convenience functions for getting values back in useful formats.

`get.obs_df`(adata[, keys, obsm_keys, layer, ...])	Return values for observations in adata.
`get.var_df`(adata[, keys, varm_keys, layer])	Return values for observations in adata.
`get.rank_genes_groups_df`(adata, group, *[, ...])	`scanpy.tl.rank_genes_groups()` results in the form of a `DataFrame`.

Queries

This module provides useful queries for annotation and enrichment.

`queries.biomart_annotations`(org, attrs, *[, ...])	Retrieve gene annotations from ensembl biomart.
`queries.gene_coordinates`(org, gene_name, *)	Retrieve gene coordinates for specific organism through BioMart.
`queries.mitochondrial_genes`(org, *[, ...])	Mitochondrial gene symbols for specific organism through BioMart.
`queries.enrich`(container, *[, org, ...])	Get enrichment for DE results.

Metrics

Collections of useful measurements for evaluating results.

`metrics.confusion_matrix`(orig, new[, data, ...])	Given an original and new set of labels, create a labelled confusion matrix.
`metrics.gearys_c`(adata, *[, vals, ...])	Calculate Geary's C, as used by VISION.
`metrics.morans_i`(adata, *[, vals, ...])	Calculate Moran’s I Global Autocorrelation Statistic.

Experimental

New methods that are in early development which are not (yet) integrated in Scanpy core.

`experimental.pp.normalize_pearson_residuals`(...)	Applies analytic Pearson residual normalization, based on [Lause21].
`experimental.pp.normalize_pearson_residuals_pca`(...)	Applies analytic Pearson residual normalization and PCA, based on [Lause21].
`experimental.pp.highly_variable_genes`(adata, *)	Select highly variable genes using analytic Pearson residuals [Lause21].
`experimental.pp.recipe_pearson_residuals`(...)	Full pipeline for HVG selection and normalization by analytic Pearson residuals ([Lause21]).

Classes

AnnData is reexported from anndata.

Represent data as a neighborhood structure, usually a knn graph.

Neighbors(adata[, n_dcs, neighbors_key])

Data represented as graph of nearest neighbors.

Settings

A convenience function for setting some default matplotlib.rcParams and a high-resolution jupyter display backend useful for use in notebooks.

set_figure_params([scanpy, dpi, dpi_save, ...])

Set resolution/size, styling and format of figures.

An instance of the ScanpyConfig is available as scanpy.settings and allows configuring Scanpy.

_settings.ScanpyConfig(*[, verbosity, ...])

Config manager for scanpy.

Some selected settings are discussed in the following.

Influence the global behavior of plotting functions. In non-interactive scripts, you’d usually want to set settings.autoshow to False.

`autoshow`	Automatically show figures if `autosave == False` (default `True`).
`autosave`	Automatically save figures in `figdir` (default `False`).

The default directories for saving figures, caching files and storing datasets.

`figdir`	Directory for saving figures (default `'./figures/'`).
`cachedir`	Directory for cache files (default `'./cache/'`).
`datasetdir`	Directory for example `datasets` (default `'./data/'`).

The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.

verbosity

Verbosity level (default warning)

Print versions of packages that might influence numerical results.

`logging.print_header`(*[, file])	Versions that might influence the numerical results.
`logging.print_versions`(*[, file])	Print versions of imported packages, OS, and jupyter environment.

Datasets

`datasets.blobs`([n_variables, n_centers, ...])	Gaussian Blobs.
`datasets.ebi_expression_atlas`(accession, *)	Load a dataset from the EBI Single Cell Expression Atlas
`datasets.krumsiek11`()	Simulated myeloid progenitors [Krumsiek11].
`datasets.moignard15`()	Hematopoiesis in early mouse embryos [Moignard15].
`datasets.pbmc3k`()	3k PBMCs from 10x Genomics.
`datasets.pbmc3k_processed`()	Processed 3k PBMCs from 10x Genomics.
`datasets.pbmc68k_reduced`()	Subsampled and processed 68k PBMCs.
`datasets.paul15`()	Development of Myeloid Progenitors [Paul15].
`datasets.toggleswitch`()	Simulated toggleswitch.
`datasets.visium_sge`([sample_id, ...])	Processed Visium Spatial Gene Expression data from 10x Genomics.

Deprecated functions

`pp.filter_genes_dispersion`(data[, flavor, ...])	Extract highly variable genes [Satija15] [Zheng17].
`pp.normalize_per_cell`(data[, ...])	Normalize total counts per cell.

API

Preprocessing: pp

Basic Preprocessing

Recipes

Batch effect correction

Neighbors

Tools: tl

Embeddings

Clustering and trajectory inference

Data integration

Marker genes

Gene scores, Cell cycle

Simulations

Plotting: pl

Generic

Classes

Preprocessing

Tools

PCA

Embeddings

Branching trajectories and pseudotime, clustering

Marker genes

Simulations

Reading

Get object from AnnData: get

Queries

Metrics

Experimental

Classes

Settings

Datasets

Deprecated functions

Preprocessing: `pp`

Tools: `tl`

Plotting: `pl`

Get object from `AnnData`: `get`