PyPI bioconda Docs Build Status

Scanpy – Single-Cell Analysis in Python

Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.

Report issues and see the code here. If Scanpy is useful for your research, consider citing Genome Biology (2018).


Also see the release notes of anndata.

Version 1.4.1 April 27, 2019

New functionality:


  • .layers support of scatter plots thanks to F Ramirez

  • fix double-logarithmization in compute of log fold change in rank_genes_groups() thanks to A Muñoz-Rojas

  • fix return sections of docs thanks to P Angerer

Version 1.4 February 5, 2019

Major updates:

  • one can now import scanpy as sc instead of import scanpy.api as sc, see here 1.3.7

  • a new plotting gallery for visualizing marker genes, see here 1.3.6 thanks to F Ramirez

  • tutorials are integrated on ReadTheDocs, see simple clustering and simple trajectory inference 1.3.6

  • a fully distributed preprocessing backend 1.3.3 thanks to T White and the Laserson Lab

  • changed default compression to None in write_h5ad() to speed up read and write, disk space use is usually less critical anndata 0.6.16

  • performance gains in write_h5ad() due to better handling of strings and categories anndata 0.6.19 thanks to S Rybakov

Two new possibilities for interactive exploration of analysis results:

Further updates:

Version 1.3 September 3, 2018

RNA velocity in single cells [Manno18]:

  • Scanpy and AnnData support loom’s layers so that computations for single-cell RNA velocity [Manno18] become feasible thanks to S Rybakov and V Bergen

  • the package scvelo perfectly harmonizes with Scanpy and is able to process loom files with splicing information produced by Velocyto [Manno18], it runs a lot faster than the count matrix analysis of Velocyto and provides several conceptual developments (preprint to come)

Plotting of marker genes and quality control, see this section and scroll down, a few examples are

  • dotplot() for visualizing genes across conditions and clusters, see here thanks to F Ramirez

  • heatmap() for pretty heatmaps, see here thanks to F Ramirez

  • violin() now produces very compact overview figures with many panels, see here thanks to F Ramirez

  • highest_expr_genes() for quality control, see here; plot genes with highest mean fraction of cells, similar to plotQC of Scater [McCarthy17] thanks to F Ramirez

There is a section on imputation:

Version 1.2 June 8, 2018

  • paga() improved, see theislab/paga; the default model changed, restore the previous default model by passing model='v1.0'

Version 1.1 May 31, 2018

Version 1.0 March 28, 2018

Scanpy is much faster and more memory efficient. Preprocess, cluster and visualize 1.3M cells in 6 h, 130K cells in 14 min and 68K cells in 3 min.

The API gained a preprocessing function neighbors() and a class Neighbors() to which all basic graph computations are delegated.

Upgrading to 1.0 isn’t fully backwards compatible in the following changes:

  • the graph-based tools louvain() dpt() draw_graph() umap() diffmap() paga() now require prior computation of the graph: sc.pp.neighbors(adata, n_neighbors=5); instead of previously, n_neighbors=5)

  • install numba via conda install numba, which replaces cython

  • the default connectivity measure (dpt will look different using default settings) changed. setting method='gauss' in sc.pp.neighbors uses gauss kernel connectivities and reproduces the previous behavior, see, for instance this example

  • namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore

  • replace occurances of group_by with groupby (consistency with pandas)

  • it is worth checking out the notebook examples to see changes, e.g., here

  • upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different

Further changes are:

  • UMAP [McInnes18] can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is now also used for measuring connectivities and computing neighbors, see neighbors()

  • graph abstraction: AGA is renamed to PAGA: paga(); now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately via louvain() and dpt(), the connectivity measure has been improved

  • logistic regression for finding marker genes rank_genes_groups() with parameter method='logreg'

  • louvain() now provides a better implementation for reclustering via restrict_to

  • scanpy no longer modifies rcParams upon import, call settings.set_figure_params to set the ‘scanpy style’

  • default cache directory is ./cache/, set settings.cachedir to change this; nested directories in this are now avoided

  • show edges in scatter plots based on graph visualization draw_graph() and umap() by passing edges=True

  • downsample_counts() for downsampling counts thanks to MD Luecken

  • default ‘louvain_groups’ are now called ‘louvain’

  • ‘X_diffmap’ now contains the zero component, plotting remains unchanged

Version 0.4.4 February 26, 2018

Version 0.4.3 February 9, 2018

  • clustermap(): heatmap from hierarchical clustering, based on seaborn.clustermap() [Waskom16]

  • only return matplotlib.Axis in plotting functions of when show=False, otherwise None

Version 0.4.2 January 7, 2018

  • amendments in PAGA and its plotting functions

Version 0.4 December 23, 2017

Version 0.3.2 November 29, 2017

  • finding marker genes via rank_genes_groups_violin() improved: example

Version 0.3 November 16, 2017

Version 0.2.9 October 25, 2017

Initial release of partition-based graph abstraction (PAGA).

Version 0.2.1 July 24, 2017

Scanpy now includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells.

Version 0.1 May 1, 2017

Scanpy computationally outperforms the Cell Ranger R kit and allows reproducing most of Seurat’s guided clustering tutorial.