API¶
Import Scanpy as:
import scanpy as sc
Note
Wrappers to external functionality are found in scanpy.external
. Previously, both core and external functionality were available through scanpy.api
(deprecated since 1.3.7).
Preprocessing: PP¶
Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.
Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.
Basic Preprocessing¶
For visual quality control, see highest_expr_gens()
and
filter_genes_dispersion()
in scanpy.plotting
.
pp.calculate_qc_metrics (adata[, expr_type, …]) |
Calculate quality control metrics. |
pp.filter_cells (data[, min_counts, …]) |
Filter cell outliers based on counts and numbers of genes expressed. |
pp.filter_genes (data[, min_counts, …]) |
Filter genes based on number of cells or counts. |
pp.highly_variable_genes (adata[, min_disp, …]) |
Annotate highly variable genes [Satija15] [Zheng17]. |
pp.filter_genes_dispersion (data[, flavor, …]) |
Extract highly variable genes [Satija15] [Zheng17]. |
pp.log1p (data[, copy, chunked, chunk_size]) |
Logarithmize the data matrix. |
pp.pca (data[, n_comps, zero_center, …]) |
Principal component analysis [Pedregosa11]. |
pp.normalize_per_cell (data[, …]) |
Normalize total counts per cell. |
pp.regress_out (adata, keys[, n_jobs, copy]) |
Regress out unwanted sources of variation. |
pp.scale (data[, zero_center, max_value, copy]) |
Scale data to unit variance and zero mean. |
pp.subsample (data[, fraction, n_obs, …]) |
Subsample to a fraction of the number of observations. |
pp.downsample_counts (adata[, target_counts, …]) |
Downsample counts so that each cell has no more than target_counts . |
Recipes¶
pp.recipe_zheng17 (adata[, n_top_genes, log, …]) |
Normalization and filtering as of [Zheng17]. |
pp.recipe_weinreb17 (adata[, log, …]) |
Normalization and filtering as of [Weinreb17]. |
pp.recipe_seurat (adata[, log, plot, copy]) |
Normalization and filtering as of Seurat [Satija15]. |
Batch effect correction¶
Note that a simple batch correction method is available via pp.regress_out()
. Checkout scanpy.external
for more.
pp.combat (adata[, key, inplace]) |
ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12]. |
Neighbors¶
pp.neighbors (adata[, n_neighbors, n_pcs, …]) |
Compute a neighborhood graph of observations [McInnes18]. |
Tools: TL¶
Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.
Embeddings¶
tl.pca (data[, n_comps, zero_center, …]) |
Principal component analysis [Pedregosa11]. |
tl.tsne (adata[, n_pcs, use_rep, perplexity, …]) |
t-SNE [Maaten08] [Amir13] [Pedregosa11]. |
tl.umap (adata[, min_dist, spread, …]) |
Embed the neighborhood graph using UMAP [McInnes18]. |
tl.draw_graph (adata[, layout, init_pos, …]) |
Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18]. |
tl.diffmap (adata[, n_comps, copy]) |
Diffusion Maps [Coifman05] [Haghverdi15] [Wolf17]. |
Clustering and trajectory inference¶
tl.leiden (adata[, resolution, random_state, …]) |
Cluster cells into subgroups [Traag18]. |
tl.louvain (adata[, resolution, …]) |
Cluster cells into subgroups [Blondel08] [Levine15] [Traag17]. |
tl.dpt (adata[, n_dcs, n_branchings, …]) |
Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf17i]. |
tl.paga (adata[, groups, use_rna_velocity, …]) |
Generate cellular maps of differentiation manifolds with complex topologies [Wolf17i]. |
Marker genes¶
tl.rank_genes_groups (adata, groupby[, …]) |
Rank genes for characterizing groups. |
Gene scores, Cell cycle¶
tl.score_genes (adata, gene_list[, …]) |
Score a set of genes [Satija15]. |
tl.score_genes_cell_cycle (adata, s_genes, …) |
Score cell cycle genes [Satija15]. |
Simulations¶
tl.sim (model[, params_file, tmax, …]) |
Simulate dynamic gene expression data [Wittmann09] [Wolf17]. |
Plotting: PL¶
The plotting module scanpy.plotting
largely parallels the tl.*
and a few of the pp.*
functions.
For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.
Reading¶
Note: For reading annotation use pandas.read_…
and add it to your anndata.AnnData
object.
The following read functions are intended for the numeric data in the data matrix X
.
Read common file formats using
read (filename[, backed, sheet, ext, …]) |
Read file and return AnnData object. |
Read 10x formatted hdf5 files and directories containing .mtx
files using
read_10x_h5 (filename[, genome, gex_only]) |
Read 10x-Genomics-formatted hdf5 file. |
read_10x_mtx (path[, var_names, make_unique, …]) |
Read 10x-Genomics-formatted mtx directory. |
Read other formats using functions borrowed from anndata
read_h5ad (filename[, backed, chunk_size]) |
Read .h5ad -formatted hdf5 file. |
read_csv (filename[, delimiter, …]) |
Read .csv file. |
read_excel (filename, sheet[, dtype]) |
Read .xlsx (Excel) file. |
read_hdf (filename, key) |
Read .h5 (hdf5) file. |
read_loom (filename[, sparse, cleanup, …]) |
Read .loom -formatted hdf5 file. |
read_mtx (filename[, dtype]) |
Read .mtx file. |
read_text (filename[, delimiter, …]) |
Read .txt , .tab , .data (text) file. |
read_umi_tools (filename[, dtype]) |
Read a gzipped condensed count matrix from umi_tools. |
Queries¶
queries.mitochondrial_genes (host, org) |
Mitochondrial gene symbols for specific organism through BioMart. |
Classes¶
AnnData
is reexported from anndata
.
Represent data as a neighborhood structure, usually a knn graph.
Neighbors (adata[, n_dcs]) |
Data represented as graph of nearest neighbors. |
Settings¶
A convenience function for setting some default matplotlib.rcParams
and a
high-resolution jupyter display backend useful for use in notebooks.
set_figure_params ([scanpy, dpi, dpi_save, …]) |
Set resolution/size, styling and format of figures. |
Influence the global behavior of plotting functions. In non-interactive scripts,
you’d usually want to set settings.autoshow
to False
.
settings.autoshow |
Automatically show figures (default: True ). |
settings.autosave |
Automatically save figures (default: False ). |
The default directories for saving figures and caching files.
settings.figdir |
Directory for saving figures (default: './figures/' ). |
settings.cachedir |
Directory for cache files (default: './cache/' ). |
The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’, 4=more details, 5=even more details, etc.
settings.verbosity |
Verbosity level (default: 1). |
Print versions of packages that might influence numerical results.
logging.print_versions () |
Versions that might influence the numerical results. |
Datasets¶
datasets.blobs ([n_variables, n_centers, …]) |
Gaussian Blobs. |
datasets.krumsiek11 () |
Simulated myeloid progenitors [Krumsiek11]. |
datasets.moignard15 () |
Hematopoiesis in early mouse embryos [Moignard15]. |
datasets.pbmc3k () |
3k PBMCs from 10x Genomics. |
datasets.pbmc68k_reduced () |
Subsampled and processed 68k PBMCs. |
datasets.paul15 () |
Development of Myeloid Progenitors [Paul15]. |
datasets.toggleswitch () |
Simulated toggleswitch. |