PyPI Docs Build Status

Scanpy – Single-Cell Analysis in Python

Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.

Report issues and see the code on GitHub. If Scanpy is useful for your research, consider citing Genome Biology (2018).


Also see the release notes of anndata.

On master July 16, 2018

Plotting of marker genes and quality control:

  • dotplot() for visualizing genes across conditions and clusters, see here thanks to F Ramirez
  • heatmap() for pretty heatmaps, see here thanks to F Ramirez
  • violin() now produces very compact overview figures with many panels, see here thanks to F Ramirez
  • highest_expr_genes() for quality control, see here; plot genes with highest mean fraction of cells, similar to plotQC of Scater [McCarthy17] thanks to F Ramirez

There now is a section on imputation:

Version 1.2 June 8, 2018

  • paga() improved, see theislab/paga; the default model changed, restore the previous default model by passing model=’v1.0’

Version 1.1 May 31, 2018

Version 1.0 March 28, 2018

Scanpy is much faster and more memory efficient. Preprocess, cluster and visualize 1.3M cells in 6 h, 130K cells in 14 min and 68K cells in 3 min.

The API gained a preprocessing function neighbors() and a class Neighbors() to which all basic graph computations are delegated.

Upgrading to 1.0 isn’t fully backwards compatible in the following changes:

  • the graph-based tools louvain() dpt() draw_graph() umap() diffmap() paga() now require prior computation of the graph: sc.pp.neighbors(adata, n_neighbors=5); instead of previously, n_neighbors=5)
  • install numba via conda install numba, which replaces cython
  • the default connectivity measure (dpt will look different using default settings) changed. setting method=’gauss’ in sc.pp.neighbors uses gauss kernel connectivities and reproduces the previous behavior, see, for instance this example
  • namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore
  • replace occurances of group_by with groupby (consistency with pandas)
  • it is worth checking out the notebook examples to see changes, e.g., here
  • upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different

Further changes are:

  • UMAP [McInnes18] can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is now also used for measuring connectivities and computing neighbors, see neighbors()
  • graph abstraction: AGA is renamed to PAGA: paga(); now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately via louvain() and dpt(), the connectivity measure has been improved
  • logistic regression for finding marker genes rank_genes_groups() with parameter method=’logreg’
  • louvain() now provides a better implementation for reclustering via restrict_to
  • scanpy no longer modifies rcParams upon import, call settings.set_figure_params to set the ‘scanpy style’
  • default cache directory is ./cache/, set settings.cachedir to change this; nested directories in this are now avoided
  • show edges in scatter plots based on graph visualization draw_graph() and umap() by passing edges=True
  • downsample_counts() for downsampling counts thanks to MD Luecken
  • default ‘louvain_groups’ are now called ‘louvain’
  • ‘X_diffmap’ now contains the zero component, plotting remains unchanged

Version 0.4.4 February 26, 2018

Version 0.4.3 February 9, 2018

  • clustermap(): heatmap from hierarchical clustering, based on seaborn.clustermap() [Waskom16]
  • only return matplotlib.Axis in plotting functions of when show=False, otherwise None

Version 0.4.2 January 7, 2018

  • amendments in PAGA and its plotting functions

Version 0.4 December 23, 2017

Version 0.3.2 November 29, 2017

Version 0.3 November 16, 2017

Version 0.2.9 October 25, 2017

Initial release of partition-based graph abstraction (PAGA).

Version 0.2.1 July 24, 2017

Scanpy now includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells.

Version 0.1 May 1, 2017

Scanpy computationally outperforms the Cell Ranger R kit and allows reproducing most of Seurat’s guided clustering tutorial.