# API¶

Import Scanpy as:

import scanpy as sc


Note

Wrappers to external functionality are found in scanpy.external.

## Preprocessing: pp¶

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

### Basic Preprocessing¶

For visual quality control, see highest_expr_genes() and filter_genes_dispersion() in scanpy.plotting.

 pp.calculate_qc_metrics(adata, *[, …]) Calculate quality control metrics. pp.filter_cells(data[, min_counts, …]) Filter cell outliers based on counts and numbers of genes expressed. pp.filter_genes(data[, min_counts, …]) Filter genes based on number of cells or counts. pp.highly_variable_genes(adata[, min_disp, …]) Annotate highly variable genes [Satija15] [Zheng17]. pp.log1p(X, *[, base, copy, chunked, …]) Logarithmize the data matrix. pp.pca(data[, n_comps, zero_center, …]) Principal component analysis [Pedregosa11]. pp.normalize_total(adata[, target_sum, …]) Normalize counts per cell. pp.regress_out(adata, keys[, n_jobs, copy]) Regress out (mostly) unwanted sources of variation. pp.scale(X[, zero_center, max_value, copy, …]) Scale data to unit variance and zero mean. pp.subsample(data[, fraction, n_obs, …]) Subsample to a fraction of the number of observations. pp.downsample_counts(adata[, …]) Downsample counts from count matrix.

### Recipes¶

 pp.recipe_zheng17(adata[, n_top_genes, log, …]) Normalization and filtering as of [Zheng17]. pp.recipe_weinreb17(adata[, log, …]) Normalization and filtering as of [Weinreb17]. pp.recipe_seurat(adata[, log, plot, copy]) Normalization and filtering as of Seurat [Satija15].

### Batch effect correction¶

Also see Data integration. Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.

 pp.combat(adata[, key, covariates, inplace]) ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12].

### Neighbors¶

 pp.neighbors(adata[, n_neighbors, n_pcs, …]) Compute a neighborhood graph of observations [McInnes18].

## Tools: tl¶

Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.

### Embeddings¶

 tl.pca(data[, n_comps, zero_center, …]) Principal component analysis [Pedregosa11]. tl.tsne(adata[, n_pcs, use_rep, perplexity, …]) tl.umap(adata[, min_dist, spread, …]) Embed the neighborhood graph using UMAP [McInnes18]. tl.draw_graph(adata[, layout, init_pos, …]) Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18]. tl.diffmap(adata[, n_comps, neighbors_key, copy]) Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].

Compute densities on embeddings.

 tl.embedding_density(adata[, basis, …]) Calculate the density of cells in an embedding (per condition).

### Clustering and trajectory inference¶

 tl.leiden(adata[, resolution, restrict_to, …]) Cluster cells into subgroups [Traag18]. tl.louvain(adata[, resolution, …]) Cluster cells into subgroups [Blondel08] [Levine15] [Traag17]. tl.dendrogram(adata, groupby[, n_pcs, …]) Computes a hierarchical clustering for the given groupby categories. tl.dpt(adata[, n_dcs, n_branchings, …]) Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19]. tl.paga(adata[, groups, use_rna_velocity, …]) Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19].

### Data integration¶

 tl.ingest(adata, adata_ref[, obs, …]) Map labels and embeddings from reference data to new data.

### Marker genes¶

 tl.rank_genes_groups(adata, groupby[, …]) Rank genes for characterizing groups. tl.filter_rank_genes_groups(adata[, key, …]) Filters out genes based on fold change and fraction of genes expressing the gene within and outside the groupby categories. tl.marker_gene_overlap(adata, …[, key, …]) Calculate an overlap score between data-deriven marker genes and provided markers

### Gene scores, Cell cycle¶

 tl.score_genes(adata, gene_list[, …]) Score a set of genes [Satija15]. tl.score_genes_cell_cycle(adata, s_genes, …) Score cell cycle genes [Satija15].

### Simulations¶

 tl.sim(model[, params_file, tmax, …]) Simulate dynamic gene expression data [Wittmann09] [Wolf18].

## Plotting: pl¶

The plotting module scanpy.plotting largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.

 plotting Plotting API

Note

For reading annotation use pandas.read_… and add it to your anndata.AnnData object. The following read functions are intended for the numeric data in the data matrix X.

Read common file formats using

 read(filename[, backed, sheet, ext, …]) Read file and return AnnData object.

Read 10x formatted hdf5 files and directories containing .mtx files using

 read_10x_h5(filename[, genome, gex_only]) Read 10x-Genomics-formatted hdf5 file. read_10x_mtx(path[, var_names, make_unique, …]) Read 10x-Genomics-formatted mtx directory. read_visium(path[, genome, count_file, …]) Read 10x-Genomics-formatted visum dataset.

Read other formats using functions borrowed from anndata

 read_h5ad(filename[, backed, as_sparse, …]) Read .h5ad-formatted hdf5 file. read_csv(filename[, delimiter, …]) Read .csv file. read_excel(filename, sheet[, dtype]) Read .xlsx (Excel) file. read_hdf(filename, key) Read .h5 (hdf5) file. read_loom(filename[, sparse, cleanup, …]) Read .loom-formatted hdf5 file. read_mtx(filename[, dtype]) Read .mtx file. read_text(filename[, delimiter, …]) Read .txt, .tab, .data (text) file. read_umi_tools(filename[, dtype]) Read a gzipped condensed count matrix from umi_tools.

## Get object from AnnData: get¶

The module sc.get provides convenience functions for getting values back in useful formats.

 get.obs_df(adata[, keys, obsm_keys, layer, …]) Return values for observations in adata. get.var_df(adata[, keys, varm_keys, layer]) Return values for observations in adata. get.rank_genes_groups_df(adata, group, *[, …]) scanpy.tl.rank_genes_groups() results in the form of a DataFrame.

## Queries¶

This module provides useful queries for annotation and enrichment.

 queries.biomart_annotations(org, attrs, *[, …]) Retrieve gene annotations from ensembl biomart. queries.gene_coordinates(org, gene_name, *) Retrieve gene coordinates for specific organism through BioMart. queries.mitochondrial_genes(org, *[, …]) Mitochondrial gene symbols for specific organism through BioMart. queries.enrich(container, *[, org, …]) Get enrichment for DE results.

## Classes¶

AnnData is reexported from anndata.

Represent data as a neighborhood structure, usually a knn graph.

 Neighbors(adata[, n_dcs, neighbors_key]) Data represented as graph of nearest neighbors.

## Settings¶

A convenience function for setting some default matplotlib.rcParams and a high-resolution jupyter display backend useful for use in notebooks.

 set_figure_params([scanpy, dpi, dpi_save, …]) Set resolution/size, styling and format of figures.

An instance of the ScanpyConfig is available as scanpy.settings and allows configuring Scanpy.

 _settings.ScanpyConfig(*[, verbosity, …]) Config manager for scanpy.

Some selected settings are discussed in the following.

Influence the global behavior of plotting functions. In non-interactive scripts, you’d usually want to set settings.autoshow to False.

 autoshow Automatically show figures if autosave == False (default True). autosave Automatically save figures in figdir (default False).

The default directories for saving figures, caching files and storing datasets.

 figdir Directory for saving figures (default './figures/'). cachedir Directory for cache files (default './cache/'). datasetdir Directory for example datasets (default './data/').

The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.

 verbosity Verbosity level (default warning)

Print versions of packages that might influence numerical results.

 Versions that might influence the numerical results.

## Datasets¶

 datasets.blobs([n_variables, n_centers, …]) Gaussian Blobs. datasets.ebi_expression_atlas(accession, *) Load a dataset from the EBI Single Cell Expression Atlas Simulated myeloid progenitors [Krumsiek11]. Hematopoiesis in early mouse embryos [Moignard15]. 3k PBMCs from 10x Genomics. Processed 3k PBMCs from 10x Genomics. Subsampled and processed 68k PBMCs. Development of Myeloid Progenitors [Paul15]. Simulated toggleswitch. datasets.visium_sge([sample_id]) Processed Visium Spatial Gene Expression data from 10x Genomics.

## Further modules¶

## Deprecated functions¶

 pp.filter_genes_dispersion(data[, flavor, …]) Extract highly variable genes [Satija15] [Zheng17]. pp.normalize_per_cell(data[, …]) Normalize total counts per cell.