Scanpy – Single-Cell Analysis in Python

Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.

Discuss usage on Discourse and development on GitHub.
Get started by browsing tutorials, usage principles or the main API.
Follow changes in the release notes.
Find tools that harmonize well with anndata & Scanpy via the external API and the ecosystem page.
Check out our contributing guide for development practices.
Consider citing Genome Biology (2018) along with original references.

News

Scanpy hits 100 contributors! 2022-03-31

100 people have contributed to Scanpy’s source code!

Of course, contributions to the project are not limited to direct modification of the source code. Many others have improved the project by building on top of it, participating in development discussions, helping others with usage, or by showing off what it’s helped them accomplish.

Thanks to all our contributors for making this project possible!

New community channels 2022-03-31

We’ve moved our forums and have a new publicly available chat!

Our discourse forum has migrated to a joint scverse forum (discourse.scverse.org).
Our private developer Slack has been replaced by a public Zulip chat (scverse.zulipchat.com).

Toolkit for spatial (squidpy) and multimodal (muon) published 2022-02-01

Two large toolkits extending our ecosystem to new modalities have had their manuscripts published!

Muon, a framework for multimodal has been published in Genome Biology.
Squidpy a toolkit for working with spatial single cell data has been published in Nature Methods.

(past news)

Latest additions

Version 1.9

1.9.8 2024-01-26

Bug fixes

Fix handling of numpy array palettes for old numpy versions PR 2832 P Angerer

1.9.7 2024-01-25

Bug fixes

Fix handling of numpy array palettes (e.g. after write-read cycle) PR 2734 P Angerer
Specify correct version of matplotlib dependency PR 2733 P Fisher
Fix scanpy.pl.violin() usage of seaborn.catplot PR 2739 E Roellin
Fix scanpy.pp.highly_variable_genes() to handle the combinations of inplace and subset consistently PR 2757 E Roellin
Replace usage of various deprecated functionality from anndata and pandas PR 2678 PR 2779 P Angerer
Allow to use default n_top_genes when using scanpy.pp.highly_variable_genes() flavor 'seurat_v3' PR 2782 P Angerer
Fix scanpy.read_10x_mtx()’s gex_only=True mode PR 2801 P Angerer

1.9.6 2023-10-31

Bug fixes

Allow scanpy.pl.scatter() to accept a str palette name PR 2571 P Angerer
Make scanpy.external.tl.palantir() compatible with palantir >=1.3 PR 2672 DJ Otto
Fix scanpy.pl.pca() when return_fig=True and annotate_var_explained=True PR 2682 J Wagner
Temp fix for issue 2680 by skipping seaborn version 0.13.0 PR 2661 P Angerer
Fix scanpy.pp.highly_variable_genes() to not modify the used layer when flavor=seurat PR 2698 E Roellin
Prevent pandas from causing infinite recursion when setting a slice of a categorical column PR 2719 P Angerer

1.9.5 2023-09-08

Bug fixes

Remove use of deprecated dtype argument to AnnData constructor PR 2658 Isaac Virshup

1.9.4 2023-08-24

Bug fixes

Support scikit-learn 1.3 PR 2515 P Angerer
Deal with None value vanishing from things like .uns['log1p'] PR 2546 SP Shen
Depend on igraph instead of python-igraph PR 2566 P Angerer
rank_genes_groups() now handles unsorted groups as intended PR 2589 S Dicks
rank_genes_groups_df() now works for rank_genes_groups() with method="logreg" PR 2601 S Dicks
_choose_representation() now works with n_pcs if bigger than settings.N_PCS PR 2610 S Dicks

1.9.3 2023-03-02

Bug fixes

Variety of fixes against pandas 2.0.0rc0 PR 2434 I Virshup
Compatibility with anndata 0.9.0rc PR 2435 I Virshup

1.9.2 2023-02-16

Bug fixes

highly_variable_genes() layer argument now works in tandem with batches PR 2302 D Schaumont
highly_variable_genes() with flavor='cell_ranger' now handles the case in issue 2230 where the number of calculated dispersions is less than n_top_genes PR 2231 L Zappia
Fix compatibility with matplotlib 3.7 PR 2414 I Virshup P Fisher
Fix scrublet numpy matrix compatibility issue PR 2395 A Gayoso

1.9.1 2022-04-05

Bug fixes

normalize_total() works when Dask is not installed PR 2209 R Cannoodt
Fix embedding plots by bumping matplotlib dependency to version 3.4 PR 2212 I Virshup

1.9.0 2022-04-01

Tutorials

New tutorial on the usage of Pearson Residuals: → tutorial: tutorial_pearson_residuals J Lause, G Palla
Materials and recordings for Scanpy workshops by Maren Büttner

Experimental module

Added scanpy.experimental module! Currently contains functionality related to pearson residuals in scanpy.experimental.pp PR 1715 J Lause, G Palla, I Virshup. This includes:
- normalize_pearson_residuals() for Pearson Residuals normalization
- highly_variable_genes() for HVG selection with Pearson Residuals
- normalize_pearson_residuals_pca() for Pearson Residuals normalization and dimensionality reduction with PCA
- recipe_pearson_residuals() for Pearson Residuals normalization, HVG selection and dimensionality reduction with PCA

Features

filter_rank_genes_groups() now allows to filter with absolute values of log fold change PR 1649 S Rybakov
_choose_representation now subsets the provided representation to n_pcs, regardless of the name of the provided representation (should affect mostly neighbors()) PR 2179 I Virshup PG Majev
scanpy.external.pp.scrublet() (and related functions) can now be used on AnnData objects containing multiple batches PR 1965 J Manning
Number of variables plotted with pca_loadings() can now be controlled with n_points argument. Additionally, variables are no longer repeated if the anndata has less than 30 variables PR 2075 Yves33
Dask arrays now work with scanpy.pp.normalize_total() PR 1663 G Buckley, I Virshup
embedding_density() now allows more than 10 groups PR 1936 A Wolf
Embedding plots can now pass colorbar_loc to specify the location of colorbar legend, or pass None to not show a colorbar PR 1821 A Schaar I Virshup
Embedding plots now have a dimensions argument, which lets users select which dimensions of their embedding to plot and uses the same broadcasting rules as other arguments PR 1538 I Virshup
print_versions() now uses session_info PR 2089 P Angerer I Virshup

Ecosystem

Multiple packages have been added to our ecosystem page, including:

decoupler a for footprint analysis and pathway enrichement PR 2186 PB Mompel
dandelion for B-cell receptor analysis PR 1953 Z Tuong
CIARA a feature selection tools for identifying rare cell types PR 2175 M Stock

Bug fixes

Fixed finding variables with use_raw=True and basis=None in scanpy.pl.scatter() PR 2027 E Rice
Fixed scanpy.external.pp.scrublet() to address issue 1957 FlMai and ensure raw counts are used for simulation
Functions in scanpy.datasets no longer throw OldFormatWarnings when using anndata 0.8 PR 2096 I Virshup
Fixed use of scanpy.pp.neighbors() with method='rapids': RAPIDS cuML no longer returns a squared Euclidean distance matrix, so we should not square-root the kNN distance matrix. PR 1828 M Zaslavsky
Removed pytables dependency by implementing read_10x_h5 with h5py due to installation errors on Windows PR 2064
Fixed bug in scanpy.external.pp.hashsolo() where default value was set improperly PR 2190 B Reiz
Fixed bug in scanpy.pl.embedding() functions where an error could be raised when there were missing values and large numbers of categories PR 2187 I Virshup