Scanpy – Single-Cell Analysis in Python
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
Get started by browsing tutorials, usage principles or the main API.
Follow changes in the release notes.
Find tools that harmonize well with anndata & Scanpy via the external API and the ecosystem page.
Check out our contributing guide for development practices.
Consider citing Genome Biology (2018) along with original references.
News
Scanpy hits 100 contributors! 2022-03-31
100 people have contributed to Scanpy’s source code!
Of course, contributions to the project are not limited to direct modification of the source code. Many others have improved the project by building on top of it, participating in development discussions, helping others with usage, or by showing off what it’s helped them accomplish.
Thanks to all our contributors for making this project possible!
New community channels 2022-03-31
We’ve moved our forums and have a new publicly available chat!
Our discourse forum has migrated to a joint scverse forum (discourse.scverse.org).
Our private developer Slack has been replaced by a public Zulip chat (scverse.zulipchat.com).
Toolkit for spatial (squidpy) and multimodal (muon) published 2022-02-01
Two large toolkits extending our ecosystem to new modalities have had their manuscripts published!
Muon, a framework for multimodal has been published in Genome Biology.
Squidpy a toolkit for working with spatial single cell data has been published in Nature Methods.
Latest additions
Version 1.9
1.9.8 2024-01-26
Bug fixes
Fix handling of numpy array palettes for old numpy versions PR 2832 P Angerer
1.9.7 2024-01-25
Bug fixes
Fix handling of numpy array palettes (e.g. after write-read cycle) PR 2734 P Angerer
Specify correct version of
matplotlib
dependency PR 2733 P FisherFix
scanpy.pl.violin()
usage ofseaborn.catplot
PR 2739 E RoellinFix
scanpy.pp.highly_variable_genes()
to handle the combinations ofinplace
andsubset
consistently PR 2757 E RoellinReplace usage of various deprecated functionality from
anndata
andpandas
PR 2678 PR 2779 P AngererAllow to use default
n_top_genes
when usingscanpy.pp.highly_variable_genes()
flavor'seurat_v3'
PR 2782 P AngererFix
scanpy.read_10x_mtx()
’sgex_only=True
mode PR 2801 P Angerer
1.9.6 2023-10-31
Bug fixes
Allow
scanpy.pl.scatter()
to accept astr
palette name PR 2571 P AngererMake
scanpy.external.tl.palantir()
compatible with palantir >=1.3 PR 2672 DJ OttoFix
scanpy.pl.pca()
whenreturn_fig=True
andannotate_var_explained=True
PR 2682 J WagnerTemp fix for issue 2680 by skipping
seaborn
version 0.13.0 PR 2661 P AngererFix
scanpy.pp.highly_variable_genes()
to not modify the used layer whenflavor=seurat
PR 2698 E RoellinPrevent pandas from causing infinite recursion when setting a slice of a categorical column PR 2719 P Angerer
1.9.5 2023-09-08
Bug fixes
Remove use of deprecated
dtype
argument to AnnData constructor PR 2658 Isaac Virshup
1.9.4 2023-08-24
Bug fixes
Support scikit-learn 1.3 PR 2515 P Angerer
Deal with
None
value vanishing from things like.uns['log1p']
PR 2546 SP ShenDepend on
igraph
instead ofpython-igraph
PR 2566 P Angererrank_genes_groups()
now handles unsorted groups as intended PR 2589 S Dicksrank_genes_groups_df()
now works forrank_genes_groups()
withmethod="logreg"
PR 2601 S Dicks_choose_representation()
now works withn_pcs
if bigger thansettings.N_PCS
PR 2610 S Dicks
1.9.3 2023-03-02
Bug fixes
1.9.2 2023-02-16
Bug fixes
highly_variable_genes()
layer
argument now works in tandem withbatches
PR 2302 D Schaumonthighly_variable_genes()
withflavor='cell_ranger'
now handles the case in issue 2230 where the number of calculated dispersions is less thann_top_genes
PR 2231 L ZappiaFix compatibility with matplotlib 3.7 PR 2414 I Virshup P Fisher
Fix scrublet numpy matrix compatibility issue PR 2395 A Gayoso
1.9.1 2022-04-05
Bug fixes
normalize_total()
works when Dask is not installed PR 2209 R CannoodtFix embedding plots by bumping matplotlib dependency to version 3.4 PR 2212 I Virshup
1.9.0 2022-04-01
Tutorials
New tutorial on the usage of Pearson Residuals: → tutorial: tutorial_pearson_residuals J Lause, G Palla
Materials and recordings for Scanpy workshops by Maren Büttner
Experimental module
Added
scanpy.experimental
module! Currently contains functionality related to pearson residuals inscanpy.experimental.pp
PR 1715 J Lause, G Palla, I Virshup. This includes:normalize_pearson_residuals()
for Pearson Residuals normalizationhighly_variable_genes()
for HVG selection with Pearson Residualsnormalize_pearson_residuals_pca()
for Pearson Residuals normalization and dimensionality reduction with PCArecipe_pearson_residuals()
for Pearson Residuals normalization, HVG selection and dimensionality reduction with PCA
Features
filter_rank_genes_groups()
now allows to filter with absolute values of log fold change PR 1649 S Rybakov_choose_representation
now subsets the provided representation to n_pcs, regardless of the name of the provided representation (should affect mostlyneighbors()
) PR 2179 I Virshup PG Majevscanpy.external.pp.scrublet()
(and related functions) can now be used onAnnData
objects containing multiple batches PR 1965 J ManningNumber of variables plotted with
pca_loadings()
can now be controlled withn_points
argument. Additionally, variables are no longer repeated if the anndata has less than 30 variables PR 2075 Yves33Dask arrays now work with
scanpy.pp.normalize_total()
PR 1663 G Buckley, I Virshupembedding_density()
now allows more than 10 groups PR 1936 A WolfEmbedding plots can now pass
colorbar_loc
to specify the location of colorbar legend, or passNone
to not show a colorbar PR 1821 A Schaar I VirshupEmbedding plots now have a
dimensions
argument, which lets users select which dimensions of their embedding to plot and uses the same broadcasting rules as other arguments PR 1538 I Virshupprint_versions()
now usessession_info
PR 2089 P Angerer I Virshup
Ecosystem
Multiple packages have been added to our ecosystem page, including:
decoupler a for footprint analysis and pathway enrichement PR 2186 PB Mompel
CIARA a feature selection tools for identifying rare cell types PR 2175 M Stock
Bug fixes
Fixed finding variables with
use_raw=True
andbasis=None
inscanpy.pl.scatter()
PR 2027 E RiceFixed
scanpy.external.pp.scrublet()
to address issue 1957 FlMai and ensure raw counts are used for simulationFunctions in
scanpy.datasets
no longer throwOldFormatWarnings
when usinganndata
0.8
PR 2096 I VirshupFixed use of
scanpy.pp.neighbors()
withmethod='rapids'
: RAPIDS cuML no longer returns a squared Euclidean distance matrix, so we should not square-root the kNN distance matrix. PR 1828 M ZaslavskyRemoved
pytables
dependency by implementingread_10x_h5
withh5py
due to installation errors on Windows PR 2064Fixed bug in
scanpy.external.pp.hashsolo()
where default value was set improperly PR 2190 B ReizFixed bug in
scanpy.pl.embedding()
functions where an error could be raised when there were missing values and large numbers of categories PR 2187 I Virshup