Scanpy – Single-Cell Analysis in Python¶
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
Get started by browsing tutorials, usage principles or the main API.
Follow changes in the release notes.
Find tools that harmonize well with anndata & Scanpy via the external API and the ecosystem page.
Consider citing Genome Biology (2018) along with original references.
News¶
scVelo on the cover of Nature Biotechnology 2020-12-01¶
Scanpy’s counterpart for RNA velocity, scVelo, made it on the cover of Nature Biotechnology [tweet].
Scanpy selected among 20 papers for 20 years of Genome Biology 2020-08-01¶
Genome Biology: Celebrating 20 Years of Genome Biology selected the initial Scanpy paper for the year 2018 among 20 papers for 20 years [tweet].
COVID-19 datasets distributed as h5ad
2020-04-01¶
In a joint initiative, the Wellcome Sanger Institute, the Human Cell Atlas, and the CZI distribute datasets related to COVID-19 via anndata’s h5ad
files: covid19cellatlas.org. It wasn’t anticipated that the initial idea of sharing and backing an on-disk representation of AnnData
would become so widely adopted. Curious? Read up more on the format.
Latest additions¶
Version 1.7¶
1.7.2 2021-04-07¶
Bug fixes
scanpy.logging.print_versions()
now works whenpython<3.8
PR 1691 I Virshupscanpy.pp.regress_out()
now usesjoblib
as the parallel backend, and should stop oversubscribing threads PR 1694 I Virshupscanpy.pp.highly_variable_genes()
withflavor="seurat_v3"
now returns correct gene means and -variances when used withbatch_key
PR 1732 J Lausescanpy.pp.highly_variable_genes()
now throws a warning instead of an error when non-integer values are passed for method"seurat_v3"
. The check can be skipped by passingcheck_values=False
. PR 1679 G Palla
Ecosystem
1.7.1 2021-02-24¶
Documentation
More twitter handles for core devs PR 1676 G Eraslan
Bug fixes
dendrogram()
use1 - correlation
as distance matrix to compute the dendrogram PR 1614 F RamirezFixed
obs_df()
/var_df()
erroring whenkeys
not passed PR 1637 I VirshupFixed argument handling for
scanpy.external.pp.scrublet()
J ManningFixed passing of
kwargs
toscanpy.pl.violin()
whenstripplot
was also used PR 1655 M van den BeekFixed colorbar creation in
scanpy.pl.timeseries_as_heatmap
PR 1654 M van den Beek
1.7.0 2021-02-03¶
Features
Add new 10x Visium datasets to
visium_sge()
PR 1473 G PallaEnable download of source image for 10x visium datasets in
visium_sge()
PR 1506 H SpitzerRefactor of
scanpy.pl.spatial()
. Better support for plotting without an image, as well as directly providing images PR 1512 G PallaDict input for
scanpy.queries.enrich()
PR 1488 G Eraslanrank_genes_groups_df()
can now return fraction of cells in a group expressing a gene, and allows retrieving values for multiple groups at once PR 1388 G EraslanColor annotations for gene sets in
heatmap()
are now matched to color for cluster PR 1511 L SikkemaPCA plots can now annotate axes with variance explained PR 1470 bfurtwa
Plots with
groupby
arguments can now group by values in the index by passing the index’s name (likepd.DataFrame.groupby
). PR 1583 F RamirezAdded
na_color
andna_in_legend
keyword arguments toembedding()
plots. Allows specifying color for missing or filtered values in plots likeumap()
orspatial()
PR 1356 I Virshupembedding()
plots now support passingdict
of{cluster_name: cluster_color, ...}
for palette argument PR 1392 I Virshup
External tools (new)
Add Scanorama integration to scanpy external API (
scanorama_integrate()
) [Hie19] PR 1332 B HieScrublet [Wolock19] integration:
scrublet()
,scrublet_simulate_doublets()
, and plotting methodscrublet_score_distribution()
PR 1476 J Manninghashsolo()
for HTO demultiplexing [Bernstein20] PR 1432 NJ BernsteinAdded scirpy (sc-AIRR analysis) to ecosystem page PR 1453 G Sturm
Added scvi-tools to ecosystem page PR 1421 A Gayoso
External tools (changes)
Updates for
palantir()
andpalantir_results()
PR 1245 A MousaFixes to
harmony_timeseries()
docs PR 1248 A MousaSupport for
leiden
clustering byscanpy.external.tl.phenograph()
PR 1080 A MousaUpdated default params of
sam()
to work with larger data PR 1540 A Tarashansky
Documentation
New contribution guide PR 1544 I Virshup
zsh
installation instructions PR 1444 P Angerer
Performance
Speed up
read_10x_h5()
PR 1402 P Weiler
Bugfixes
Consistent fold-change, fractions calculation for filter_rank_genes_groups PR 1391 S Rybakov
Fixed bug where
score_genes
would error if one gene was passed PR 1398 I VirshupFixed
log1p
inplace on integer dense arrays PR 1400 I VirshupFix docstring formatting for
rank_genes_groups()
PR 1417 P WeilerRemoved
PendingDeprecationWarning`s from use of `np.matrix
PR 1424 P WeilerFixed indexing byg in
~scanpy.pp.highly_variable_genes
PR 1456 V BergenFix default number of genes for marker_genes_overlap PR 1464 MD Luecken
Fixed passing
groupby
anddendrogram_key
todendrogram()
PR 1465 M VarmaFixed download path of
pbmc3k_processed
PR 1472 D StroblBetter error message when computing DE with a group of size 1 PR 1490 J Manning
Update cugraph API usage for v0.16 PR 1494 R Ilango
Fixed
marker_gene_overlap
default value fortop_n_markers
PR 1464 MD LueckenPass
random_state
to RAPIDs UMAP PR 1474 C NoletFixed
anndata
version requirement forconcat()
(re-exported from scanpy assc.concat
) PR 1491 I VirshupFixed the width of the progress bar when downloading data PR 1507 M Klein
Updated link for
moignard15
dataset PR 1542 I VirshupFixed bug where calling
set_figure_params
could block if IPython was installed, but not used. PR 1547 I Virshupviolin()
no longer fails if.raw
not present PR 1548 I Virshupspatial()
refactoring and better handling of spatial data PR 1512 G PallaCompatibility with UMAP v0.5 PR 1601 PR 1589 S Rybakov, I Virshup