Scanpy – Single-Cell Analysis in Python
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
Get started by browsing tutorials, usage principles or the main API.
Follow changes in the release notes.
Find tools that harmonize well with anndata & Scanpy via the external API and the ecosystem page.
Check out our contributing guide for development practices.
Consider citing Genome Biology (2018) along with original references.
News
Scanpy hits 100 contributors! 2022-03-31
100 people have contributed to Scanpy’s source code!
Of course, contributions to the project are not limited to direct modification of the source code. Many others have improved the project by building on top of it, participating in development discussions, helping others with usage, or by showing off what it’s helped them accomplish.
Thanks to all our contributors for making this project possible!
New community channels 2022-03-31
We’ve moved our forums and have a new publicly available chat!
Our discourse forum has migrated to a joint scverse forum (discourse.scverse.org).
Our private developer Slack has been replaced by a public Zulip chat (scverse.zulipchat.com).
Toolkit for spatial (squidpy) and multimodal (muon) published 2022-02-01
Two large toolkits extending our ecosystem to new modalities have had their manuscripts published!
Muon, a framework for multimodal has been published in Genome Biology.
Squidpy a toolkit for working with spatial single cell data has been published in Nature Methods.
Latest additions
Version 1.9
1.9.8 2024-01-26
Bug fixes
Fix handling of numpy array palettes for old numpy versions PR 2832 P Angerer
1.9.7 2024-01-25
Bug fixes
Fix handling of numpy array palettes (e.g. after write-read cycle) PR 2734 P Angerer
Specify correct version of
matplotlibdependency PR 2733 P FisherFix
scanpy.pl.violin()usage ofseaborn.catplotPR 2739 E RoellinFix
scanpy.pp.highly_variable_genes()to handle the combinations ofinplaceandsubsetconsistently PR 2757 E RoellinReplace usage of various deprecated functionality from
anndataandpandasPR 2678 PR 2779 P AngererAllow to use default
n_top_geneswhen usingscanpy.pp.highly_variable_genes()flavor'seurat_v3'PR 2782 P AngererFix
scanpy.read_10x_mtx()’sgex_only=Truemode PR 2801 P Angerer
1.9.6 2023-10-31
Bug fixes
Allow
scanpy.pl.scatter()to accept astrpalette name PR 2571 P AngererMake
scanpy.external.tl.palantir()compatible with palantir >=1.3 PR 2672 DJ OttoFix
scanpy.pl.pca()whenreturn_fig=Trueandannotate_var_explained=TruePR 2682 J WagnerTemp fix for issue 2680 by skipping
seabornversion 0.13.0 PR 2661 P AngererFix
scanpy.pp.highly_variable_genes()to not modify the used layer whenflavor=seuratPR 2698 E RoellinPrevent pandas from causing infinite recursion when setting a slice of a categorical column PR 2719 P Angerer
1.9.5 2023-09-08
Bug fixes
Remove use of deprecated
dtypeargument to AnnData constructor PR 2658 Isaac Virshup
1.9.4 2023-08-24
Bug fixes
Support scikit-learn 1.3 PR 2515 P Angerer
Deal with
Nonevalue vanishing from things like.uns['log1p']PR 2546 SP ShenDepend on
igraphinstead ofpython-igraphPR 2566 P Angererrank_genes_groups()now handles unsorted groups as intended PR 2589 S Dicksrank_genes_groups_df()now works forrank_genes_groups()withmethod="logreg"PR 2601 S Dicks_choose_representation()now works withn_pcsif bigger thansettings.N_PCSPR 2610 S Dicks
1.9.3 2023-03-02
Bug fixes
1.9.2 2023-02-16
Bug fixes
highly_variable_genes()layerargument now works in tandem withbatchesPR 2302 D Schaumonthighly_variable_genes()withflavor='cell_ranger'now handles the case in issue 2230 where the number of calculated dispersions is less thann_top_genesPR 2231 L ZappiaFix compatibility with matplotlib 3.7 PR 2414 I Virshup P Fisher
Fix scrublet numpy matrix compatibility issue PR 2395 A Gayoso
1.9.1 2022-04-05
Bug fixes
normalize_total()works when Dask is not installed PR 2209 R CannoodtFix embedding plots by bumping matplotlib dependency to version 3.4 PR 2212 I Virshup
1.9.0 2022-04-01
Tutorials
New tutorial on the usage of Pearson Residuals: → tutorial: tutorial_pearson_residuals J Lause, G Palla
Materials and recordings for Scanpy workshops by Maren Büttner
Experimental module
Added
scanpy.experimentalmodule! Currently contains functionality related to pearson residuals inscanpy.experimental.ppPR 1715 J Lause, G Palla, I Virshup. This includes:normalize_pearson_residuals()for Pearson Residuals normalizationhighly_variable_genes()for HVG selection with Pearson Residualsnormalize_pearson_residuals_pca()for Pearson Residuals normalization and dimensionality reduction with PCArecipe_pearson_residuals()for Pearson Residuals normalization, HVG selection and dimensionality reduction with PCA
Features
filter_rank_genes_groups()now allows to filter with absolute values of log fold change PR 1649 S Rybakov_choose_representationnow subsets the provided representation to n_pcs, regardless of the name of the provided representation (should affect mostlyneighbors()) PR 2179 I Virshup PG Majevscanpy.external.pp.scrublet()(and related functions) can now be used onAnnDataobjects containing multiple batches PR 1965 J ManningNumber of variables plotted with
pca_loadings()can now be controlled withn_pointsargument. Additionally, variables are no longer repeated if the anndata has less than 30 variables PR 2075 Yves33Dask arrays now work with
scanpy.pp.normalize_total()PR 1663 G Buckley, I Virshupembedding_density()now allows more than 10 groups PR 1936 A WolfEmbedding plots can now pass
colorbar_locto specify the location of colorbar legend, or passNoneto not show a colorbar PR 1821 A Schaar I VirshupEmbedding plots now have a
dimensionsargument, which lets users select which dimensions of their embedding to plot and uses the same broadcasting rules as other arguments PR 1538 I Virshupprint_versions()now usessession_infoPR 2089 P Angerer I Virshup
Ecosystem
Multiple packages have been added to our ecosystem page, including:
decoupler a for footprint analysis and pathway enrichement PR 2186 PB Mompel
CIARA a feature selection tools for identifying rare cell types PR 2175 M Stock
Bug fixes
Fixed finding variables with
use_raw=Trueandbasis=Noneinscanpy.pl.scatter()PR 2027 E RiceFixed
scanpy.external.pp.scrublet()to address issue 1957 FlMai and ensure raw counts are used for simulationFunctions in
scanpy.datasetsno longer throwOldFormatWarningswhen usinganndata0.8PR 2096 I VirshupFixed use of
scanpy.pp.neighbors()withmethod='rapids': RAPIDS cuML no longer returns a squared Euclidean distance matrix, so we should not square-root the kNN distance matrix. PR 1828 M ZaslavskyRemoved
pytablesdependency by implementingread_10x_h5withh5pydue to installation errors on Windows PR 2064Fixed bug in
scanpy.external.pp.hashsolo()where default value was set improperly PR 2190 B ReizFixed bug in
scanpy.pl.embedding()functions where an error could be raised when there were missing values and large numbers of categories PR 2187 I Virshup