Release notes#
Version 1.11#
1.11.2 2025-05-28#
Bug fixes#
Fix zappy compatibility for clip_array P Angerer (#3351)
Fixes an error where
regress_out
would fail to work withinteger
types S Dicks (#3461)Prevent plotting with
mask_obs
from mutating data V Menon (#3496)Prevent
scanpy.pp.scale()
from creating a daskArray
withnumpy.matrix
chunks P Angerer (#3597)Allow using
sklearn
≥1.6, Dask ≥2024.8, andsphinx
≥8.2.1 P Angerer (#3611)Fixed handling of
ext
argument inscanpy.read()
I Gold #3643Fix error message when trying to use
sc.pp.pca(x, zero_center=False)
with a sparse dask array. P Angerer (#3646)
Documentation#
Clarify use of implementations in
scanpy.pp.pca()
docs. P Angerer (#3655)
Performance#
Speed up for a categorical regressor in
regress_out()
S Dicks I Gold (#3353)In
pp.normalize_total
, the median is now computed in-memory when using Dask S Dicks (#3379)Speed up
pp.normalize_total
with a numba kernel forcsr-matrices
S Dicks (#3571)
1.11.1 2025-03-31#
Bug fixes#
Features#
Allow
covariance_eigh
as a solver option forpca()
withdask.array.Array
dense data ilan-gold (#3528)
Performance#
Speed up wilcoxon rank-sum test with numba G Wu (#3529)
1.11.0 2025-02-14#
Release candidates:
rc2 2025-01-24
rc1 2024-12-20
Features#
rc1
sample()
supports both upsampling and downsampling of observations and variables.subsample()
is now deprecated. G Eraslan & P Angerer (#943)rc1 Add
layer
argument toscanpy.tl.score_genes()
andscanpy.tl.score_genes_cell_cycle()
L Zappia (#2921)rc1 Prevent
raw
conflict withlayer
inscore_genes()
S Dicks (#3155)rc1 Add support for
median
as an aggregation function toaggregate()
. This allows for median-based aggregation of data (e.g., pseudobulk), complementing existing methods like mean- and sum-based aggregation M Dehkordi (Farhad) (#3180)rc1 Add
key_added
argument topca()
,tsne()
andumap()
P Angerer (#3184)rc1 Support running
scanpy.pp.pca()
on sparse Dask arrays with the'covariance_eigh'
solver P Angerer (#3263)rc1 Use upstreamed
PCA
implementation forcsr_array
andcsr_matrix
(see scikit-learn Version 1.4.0) P Angerer (#3267)rc1 Add explicit support to
scanpy.pp.pca()
forsvd_solver='covariance_eigh'
P Angerer (#3296)rc1 Add support for
dask.array.Array
toscanpy.pp.calculate_qc_metrics()
I Gold (#3307)rc1 Support
layer
parameter inscanpy.pl.highest_expr_genes()
P Angerer (#3324)rc1 Run numba functions single-threaded when called from inside of a
ThreadPool
P Angerer (#3335)rc1 Switch
print_header()
andprint_versions()
tosession_info2
P Angerer (#3384)rc1 Add sampling probabilities/mask parameter
p
tosample()
P Angerer (#3410)
Performance#
rc1 Speed up
regress_out()
P Ashish, P Angerer & S Dicks (#3284)
Documentation#
rc1 Improve
harmony_integrate()
docs D Kühl (#3362)rc1 Raise
FutureWarning
when calling deprecatedscanpy.pp
functions P Angerer (#3380)rc1 P Angerer (#3407)
Bug fixes#
rc1 Upper-bound
sklearn
<1.6.0
due to dask/dask-ml#1002 Ilan Gold (#3393)rc2 Fix
rank_genes_groups()
compatibility with data >10M cells P Angerer (#3426)rc2 Fix
scanpy.pl.rank_genes_groups()
’sax
parameter P Angerer (#3428)
Development Process#
rc2 Fix version number inference in development environments (CI and local) P Angerer (#3441)
Version 1.10#
1.10.4 2024-11-12#
Breaking changes#
Remove Python 3.9 support P Angerer (#3283)
Bug fixes#
Fix
scanpy.pl.DotPlot.style()
,scanpy.pl.MatrixPlot.style()
, andscanpy.pl.StackedViolin.style()
resetting all non-specified parameters P Angerer (#3206)Accept
'group'
instead of'obs'
forstandard_scale
parameter instacked_violin()
P Angerer (#3243)Use
density_norm
instead of ofscale
(cont. from #2844) inviolin()
andstacked_violin()
P Angerer (#3244)Switched all compatibility adapters for positional parameters to
FutureWarning
P Angerer (#3264)Catch
PerfectSeparationWarning
duringregress_out()
J Wagner (#3275)Fix
scanpy.pp.highly_variable_genes()
for batches of size 1 P Angerer (#3286)Fix
scanpy.pl.scatter()
’scolor
parameter to take collections as advertised P Angerer (#3299)Fix
scanpy.pl.highest_expr_genes()
when used with a categorical gene symbol column P Angerer (#3302)
1.10.3 2024-09-17#
Bug fixes#
Prevent empty control gene set in
score_genes()
M Müller (#2875)Fix
subset=True
ofhighly_variable_genes()
whenflavor
isseurat
orcell_ranger
, andbatch_key!=None
E Roellin (#3042)Add compatibility with
numpy
2.0 P Angerer #3065 and (#3115)Fix
legend_loc
argument inscanpy.pl.embedding()
not accepting matplotlib parameters P Angerer (#3163)Fix dispersion cutoff in
highly_variable_genes()
in presence ofNaN
s P Angerer (#3176)Fix axis labeling for swapped axes in
rank_genes_groups_stacked_violin()
Ilan Gold (#3196)Upper bound dask on account of scverse/anndata#1579 Ilan Gold (#3217)
The fa2-modified package replaces forceatlas2 for the latter’s lack of maintenance A Alam (#3220)
1.10.2 2024-06-25#
Development Process#
Add performance benchmarking #2977 R Shrestha, P Angerer
Documentation#
Bug fixes#
Compatibility with
matplotlib
3.9 #2999 I VirshupAdd clear errors where
backed
mode-like matrices (i.e., fromsparse_dataset
) are not supported #3048 I goldWrite out full pca results when
_choose_representation
is called i.e.,neighbors()
withoutpca()
#3078 I goldFix deprecated use of
.A
with sparse matrices #3084 P AngererFix zappy support #3089 P Angerer
Performance#
1.10.1 2024-04-09#
Documentation#
Added how-to example on plotting with Marsilea #2974 Y Zheng
Bug fixes#
Fix
aggregate
when aggregating by more than two groups #2965 I Virshup
Performance#
1.10.0 2024-03-26#
scanpy
1.10 brings a large amount of new features, performance improvements, and improved documentation.
Some highlights:
Improved support for out-of-core workflows via
dask
. See new tutorial: Using dask with Scanpy demonstrating counts-to-clusters for 1.4 million cells in <10 min.A new basic clustering tutorial demonstrating an updated workflow.
Opt-in increased performance for neighbor search and clustering (how to guide).
Ability to
mask
observations or variables from a number of methods (see Customizing Scanpy plots for an example with plotting embeddings)A new function
aggregate()
for computing aggregations of your data, very useful for pseudo bulking!
Features#
scrublet()
andscrublet_simulate_doublets()
were moved fromscanpy.external.pp
toscanpy.pp
. Thescrublet
implementation is now maintained as part of scanpy #2703 P Angererscanpy.pp.pca()
,scanpy.pp.scale()
,scanpy.pl.embedding()
, andscanpy.experimental.pp.normalize_pearson_residuals_pca()
now support amask
parameter #2272 C Bright, T Marcella, & P AngererEnhanced dask support for some internal utilities, paving the way for more extensive dask support #2696 P Angerer
scanpy.pp.highly_variable_genes()
supports dask for the defaultseurat
andcell_ranger
flavors #2809 P AngererNew function
scanpy.get.aggregate()
which allows grouped aggregations over your data. Useful for pseudobulking! #2590 Isaac Virshup Ilan Gold Jon Bloomscanpy.pp.neighbors()
now has atransformer
argument allowing the use of different ANN/ KNN libraries #2536 P Angererscanpy.experimental.pp.highly_variable_genes()
usingflavor='pearson_residuals'
now uses numba for variance computation and is faster #2612 S Dicks & P Angererscanpy.tl.leiden()
now offersigraph
’s implementation of the leiden algorithm via viaflavor
when set toigraph
.leidenalg
’s implementation is still default, but discouraged. #2815 I Goldscanpy.pp.highly_variable_genes()
has new flavorseurat_v3_paper
that is in its implementation consistent with the paper description in Stuart et al 2018. #2792 E Roellinscanpy.datasets.blobs()
now accepts arandom_state
argument #2683 E Roellinscanpy.pp.pca()
andscanpy.pp.regress_out()
now accept a layer argument #2588 S Dicksscanpy.pp.subsample()
withcopy=True
can now be called in backed mode #2624 E Roellinscanpy.external.pp.harmony_integrate()
now runs with 64 bit floats improving reproducibility #2655 S Dicksscanpy.tl.rank_genes_groups()
no longer warns that it’s default was changed from t-test_overestim_var to t-test #2798 L Heumosscanpy.pp.calculate_qc_metrics
now allowsqc_vars
to be passed as a string #2859 N Teyssierscanpy.tl.leiden()
andscanpy.tl.louvain()
now store clustering parameters in the key provided by thekey_added
parameter instead of always writing to (or overwriting) a default key #2864 J Fanscanpy.pp.scale()
now clipsnp.ndarray
also at- max_value
for zero-centering #2913 S DicksSupport sparse chunks in dask
scale()
,normalize_total()
andhighly_variable_genes()
(seurat
andcell-ranger
tested) #2856 ilan-gold
Documentation#
Doc style overhaul #2220 A Gayoso
Re-add search-as-you-type, this time via
readthedocs-sphinx-search
#2805 P AngererFixed a lot of broken usage examples #2605 P Angerer
Improved harmonization of return field of
sc.pp
andsc.tl
functions #2742 E RoellinImproved docs for
percent_top
argument ofcalculate_qc_metrics()
#2849 I VirshupNew basic clustering tutorial (Preprocessing and clustering), based on one from scverse-tutorials #2901 I Virshup
Overhauled Tutorials page, and added new How to section to docs #2901 I Virshup
Added a new tutorial on working with dask (Using dask with Scanpy) #2901 I Gold I Virshup
Bug fixes#
Updated
read_visium()
such that it can read spaceranger 2.0 files L LehnerFix
normalize_total()
for dask #2466 P AngererFix setting
sc.settings.verbosity
in some cases #2605 P AngererFix all remaining pandas warnings #2789 P Angerer
Fix some annoying plotting warnings around violin plots #2844 P Angerer
Scanpy now has a test job which tests against the minumum versions of the dependencies. In the process of implementing this, many bugs associated with using older versions of
pandas
,anndata
,numpy
, andmatplotlib
were fixed. #2816 I VirshupFix warnings caused by internal usage of
pandas.DataFrame.stack
withpandas>=2.1
#2864I Virshupscanpy.get.aggregate()
now always returnsnumpy.ndarray
#2893 S DicksRemoves self from array of neighbors for
use_approx_neighbors = True
inscrublet()
#2896S DicksCompatibility with scipy 1.13 #2943 I Virshup
Fix use of
dendrogram()
on highly correlated low precision data #2928 P AngererFix pytest deprecation warning #2879 P Angerer
Development Process#
Deprecations#
Dropped support for Python 3.8. More details here. #2695 P Angerer
Deprecated specifying large numbers of function parameters by position as opposed to by name/keyword in all public APIs. e.g. prefer
sc.tl.umap(adata, min_dist=0.1, spread=0.8)
oversc.tl.umap(adata, 0.1, 0.8)
#2702 P AngererDropped support for
umap<0.5
for performance reasons. #2870 P Angerer
Version 1.9#
1.9.8 2024-01-26#
Bug fixes#
Fix handling of numpy array palettes for old numpy versions #2832 P Angerer
1.9.7 2024-01-25#
Bug fixes#
Fix handling of numpy array palettes (e.g. after write-read cycle) #2734 P Angerer
Specify correct version of
matplotlib
dependency #2733 P FisherFix
scanpy.pl.violin()
usage ofseaborn.catplot
#2739 E RoellinFix
scanpy.pp.highly_variable_genes()
to handle the combinations ofinplace
andsubset
consistently #2757 E RoellinReplace usage of various deprecated functionality from
anndata
andpandas
#2678 #2779 P AngererAllow to use default
n_top_genes
when usingscanpy.pp.highly_variable_genes()
flavor'seurat_v3'
#2782 P AngererFix
scanpy.read_10x_mtx()
’sgex_only=True
mode #2801 P Angerer
1.9.6 2023-10-31#
Bug fixes#
Allow
scanpy.pl.scatter()
to accept astr
palette name #2571 P AngererMake
scanpy.external.tl.palantir()
compatible with palantir >=1.3 #2672 DJ OttoFix
scanpy.pl.pca()
whenreturn_fig=True
andannotate_var_explained=True
#2682 J WagnerTemp fix for #2680 by skipping
seaborn
version 0.13.0 #2661 P AngererFix
scanpy.pp.highly_variable_genes()
to not modify the used layer whenflavor=seurat
#2698 E RoellinPrevent pandas from causing infinite recursion when setting a slice of a categorical column #2719 P Angerer
1.9.5 2023-09-08#
Bug fixes#
Remove use of deprecated
dtype
argument to AnnData constructor #2658 Isaac Virshup
1.9.4 2023-08-24#
Bug fixes#
Support scikit-learn 1.3 #2515 P Angerer
Deal with
None
value vanishing from things like.uns['log1p']
#2546 SP ShenDepend on
igraph
instead ofpython-igraph
#2566 P Angererrank_genes_groups()
now handles unsorted groups as intended #2589 S Dicksrank_genes_groups_df()
now works forrank_genes_groups()
withmethod="logreg"
#2601 S Dicksscanpy.tl._utils._choose_representation
now works withn_pcs
if bigger thansettings.N_PCS
#2610 S Dicks
1.9.3 2023-03-02#
Bug fixes#
Variety of fixes against pandas 2.0.0rc0 #2434 I Virshup
1.9.2 2023-02-16#
Bug fixes#
highly_variable_genes()
layer
argument now works in tandem withbatches
#2302 D Schaumonthighly_variable_genes()
withflavor='cell_ranger'
now handles the case in #2230 where the number of calculated dispersions is less thann_top_genes
#2231 L ZappiaFix compatibility with matplotlib 3.7 #2414 I Virshup P Fisher
Fix scrublet numpy matrix compatibility issue #2395 A Gayoso
1.9.1 2022-04-05#
Bug fixes#
normalize_total()
works when Dask is not installed #2209 R CannoodtFix embedding plots by bumping matplotlib dependency to version 3.4 #2212 I Virshup
1.9.0 2022-04-01#
Tutorials#
New tutorial on the usage of Pearson Residuals: How to preprocess UMI count data with analytic Pearson residuals J Lause, G Palla
Materials and recordings for Scanpy workshops by Maren Büttner
Experimental module#
Added
scanpy.experimental
module! Currently contains functionality related to pearson residuals inscanpy.experimental.pp
#1715 J Lause, G Palla, I Virshup. This includes:normalize_pearson_residuals()
for Pearson Residuals normalizationhighly_variable_genes()
for HVG selection with Pearson Residualsnormalize_pearson_residuals_pca()
for Pearson Residuals normalization and dimensionality reduction with PCArecipe_pearson_residuals()
for Pearson Residuals normalization, HVG selection and dimensionality reduction with PCA
Features#
filter_rank_genes_groups()
now allows to filter with absolute values of log fold change #1649 S Rybakov_choose_representation
now subsets the provided representation to n_pcs, regardless of the name of the provided representation (should affect mostlyneighbors()
) #2179 I Virshup PG Majevscanpy.pp.scrublet()
(and related functions) can now be used onAnnData
objects containing multiple batches #1965 J ManningNumber of variables plotted with
pca_loadings()
can now be controlled withn_points
argument. Additionally, variables are no longer repeated if the anndata has less than 30 variables #2075 Yves33Dask arrays now work with
scanpy.pp.normalize_total()
#1663 G Buckley, I Virshupembedding_density()
now allows more than 10 groups #1936 A WolfEmbedding plots can now pass
colorbar_loc
to specify the location of colorbar legend, or passNone
to not show a colorbar #1821 A Schaar I VirshupEmbedding plots now have a
dimensions
argument, which lets users select which dimensions of their embedding to plot and uses the same broadcasting rules as other arguments #1538 I Virshupprint_versions()
now usessession_info
#2089 P Angerer I Virshup
Ecosystem#
Multiple packages have been added to our ecosystem page, including:
Bug fixes#
Fixed finding variables with
use_raw=True
andbasis=None
inscanpy.pl.scatter()
#2027 E RiceFixed
scanpy.pp.scrublet()
to address #1957 FlMai and ensure raw counts are used for simulationFunctions in
scanpy.datasets
no longer throwOldFormatWarnings
when usinganndata
0.8
#2096 I VirshupFixed use of
scanpy.pp.neighbors()
withmethod='rapids'
: RAPIDS cuML no longer returns a squared Euclidean distance matrix, so we should not square-root the kNN distance matrix. #1828 M ZaslavskyRemoved
pytables
dependency by implementingread_10x_h5
withh5py
due to installation errors on Windows #2064Fixed bug in
scanpy.external.pp.hashsolo()
where default value was set improperly #2190 B ReizFixed bug in
scanpy.pl.embedding()
functions where an error could be raised when there were missing values and large numbers of categories #2187 I Virshup
Version 1.8#
1.8.2 2021-11-3#
Documentation#
Update conda installation instructions #1974 L Heumos
Bug fixes#
Fix plotting after
scanpy.tl.filter_rank_genes_groups()
#1942 S RybakovFix
use_raw=None
usinganndata.AnnData.var_names
ifanndata.AnnData.raw
is present inscanpy.tl.score_genes()
#1999 M KleinFix compatibility with UMAP 0.5.2 #2028 L Mcinnes
Fixed non-determinism in
scanpy.pl.paga()
node positions #1922 I Virshup
Ecosystem#
Added PASTE (a tool to align and integrate spatial transcriptomics data) to scanpy ecosystem.
1.8.1 2021-07-07#
Bug fixes#
Fixed reproducibility of
scanpy.tl.score_genes()
. Calculation and output is now float64 type. #1890 I KucinskiWorkarounds for some changes/ bugs in pandas 1.3 #1918 I Virshup
Fixed bug where
sc.pl.paga_compare
could mislabel nodes on the paga graph #1898 I VirshupFixed handling of
use_raw
withscanpy.tl.rank_genes_groups()
#1934 I Virshup
1.8.0 2021-06-28#
Metrics module#
Added
scanpy.metrics
module!Added
scanpy.metrics.gearys_c()
for spatial autocorrelation #915 I VirshupAdded
scanpy.metrics.morans_i()
for global spatial autocorrelation #1740 I Virshup, G PallaAdded
scanpy.metrics.confusion_matrix()
for comparing labellings #915 I Virshup
Features#
Added
layer
andcopy
kwargs tonormalize_total()
#1667 I VirshupAdded
vcenter
andnorm
arguments to the plotting functions #1551 G EraslanStandardized and expanded available arguments to the
sc.pl.rank_genes_groups*
family of functions. #1529 F Ramirez I VirshupSee examples sections of
rank_genes_groups_dotplot()
andrank_genes_groups_matrixplot()
for demonstrations.
scanpy.tl.tsne()
now supports the metric argument and records the passed parameters #1854 I Virshupscanpy.pl.scrublet_score_distribution()
now uses same API as other scanpy functions for saving/ showing plots #1741 J Manning
Ecosystem#
Documentation#
Added rendered examples to many plotting functions #1664 A Schaar L Zappia bio-la L Hetzel L Dony M Buttner K Hrovatin F Ramirez I Virshup LouisK92 mayarali
Integrated DocSearch, a find-as-you-type documentation index search. #1754 P Angerer
Reorganized reference docs #1753 I Virshup
Clarified docs issues for
neighbors()
,diffmap()
,calculate_qc_metrics()
#1680 G PallaFixed typos in grouped plot doc-strings #1877 C Rands
Extended examples for differential expression plotting. #1529 F Ramirez
See
rank_genes_groups_dotplot()
orrank_genes_groups_matrixplot()
for examples.
Bug fixes#
Fix
scanpy.pl.paga_path()
TypeError
with recent versions of anndata #1047 P AngererFix detection of whether IPython is running #1844 I Virshup
Fixed reproducibility of
scanpy.tl.diffmap()
(added random_state) #1858 I KucinskiFixed errors and warnings from embedding plots with small numbers of categories after
sns.set_palette
was called #1886 I VirshupFixed handling of
gene_symbols
argument in a number ofsc.pl.rank_genes_groups*
functions #1529 F Ramirez I VirshupFixed handling of
use_raw
forsc.tl.rank_genes_groups
when no.raw
is present #1895 I Virshupscanpy.pl.rank_genes_groups_violin()
now works forraw=False
#1669 M van den Beekscanpy.pl.dotplot()
now usessmallest_dot
argument correctly #1771 S Flemming
Development Process#
Switched to flit for building and deploying the package, a simple tool with an easy to understand command line interface and metadata #1527 P Angerer
Use pre-commit for style checks #1684 #1848 L Heumos I Virshup
Deprecations#
Dropped support for Python 3.6. More details here. #1897 I Virshup
Deprecated
layers
andlayers_norm
kwargs tonormalize_total()
#1667 I VirshupDeprecated
MulticoreTSNE
backend forscanpy.tl.tsne()
#1854 I Virshup
Version 1.7#
1.7.2 2021-04-07#
Bug fixes#
scanpy.logging.print_versions()
now works whenpython<3.8
#1691 I Virshupscanpy.pp.regress_out()
now usesjoblib
as the parallel backend, and should stop oversubscribing threads #1694 I Virshupscanpy.pp.highly_variable_genes()
withflavor="seurat_v3"
now returns correct gene means and -variances when used withbatch_key
#1732 J Lausescanpy.pp.highly_variable_genes()
now throws a warning instead of an error when non-integer values are passed for method"seurat_v3"
. The check can be skipped by passingcheck_values=False
. #1679 G Palla
Ecosystem#
1.7.1 2021-02-24#
Documentation#
More twitter handles for core devs #1676 G Eraslan
Bug fixes#
dendrogram()
use1 - correlation
as distance matrix to compute the dendrogram #1614 F RamirezFixed
obs_df()
/var_df()
erroring whenkeys
not passed #1637 I VirshupFixed argument handling for
scanpy.pp.scrublet()
J ManningFixed passing of
kwargs
toscanpy.pl.violin()
whenstripplot
was also used #1655 M van den BeekFixed colorbar creation in
scanpy.pl.timeseries_as_heatmap
#1654 M van den Beek
1.7.0 2021-02-03#
Features#
Add new 10x Visium datasets to
visium_sge()
#1473 G PallaEnable download of source image for 10x visium datasets in
visium_sge()
#1506 H SpitzerRefactor of
scanpy.pl.spatial()
. Better support for plotting without an image, as well as directly providing images #1512 G PallaDict input for
scanpy.queries.enrich()
#1488 G Eraslanrank_genes_groups_df()
can now return fraction of cells in a group expressing a gene, and allows retrieving values for multiple groups at once #1388 G EraslanColor annotations for gene sets in
heatmap()
are now matched to color for cluster #1511 L SikkemaPCA plots can now annotate axes with variance explained #1470 bfurtwa
Plots with
groupby
arguments can now group by values in the index by passing the index’s name (likepd.DataFrame.groupby
). #1583 F RamirezAdded
na_color
andna_in_legend
keyword arguments toembedding()
plots. Allows specifying color for missing or filtered values in plots likeumap()
orspatial()
#1356 I Virshupembedding()
plots now support passingdict
of{cluster_name: cluster_color, ...}
for palette argument #1392 I Virshup
External tools (new)#
Add Scanorama integration to scanpy external API (
scanorama_integrate()
, Hie et al. [2019]) #1332 B HieScrublet [Wolock et al., 2019] integration:
scrublet()
,scrublet_simulate_doublets()
, and plotting methodscrublet_score_distribution()
#1476 J Manninghashsolo()
for HTO demultiplexing [Bernstein et al., 2020] #1432 NJ BernsteinAdded scirpy (sc-AIRR analysis) to ecosystem page #1453 G Sturm
Added scvi-tools to ecosystem page #1421 A Gayoso
External tools (changes)#
Updates for
palantir()
andpalantir_results()
#1245 A MousaFixes to
harmony_timeseries()
docs #1248 A MousaSupport for
leiden
clustering byscanpy.external.tl.phenograph()
#1080 A MousaDeprecate
scanpy.external.pp.scvi
#1554 G XingUpdated default params of
sam()
to work with larger data #1540 A Tarashansky
Documentation#
New contribution guide #1544 I Virshup
zsh
installation instructions #1444 P Angerer
Performance#
Speed up
read_10x_h5()
#1402 P Weiler
Bugfixes#
Consistent fold-change, fractions calculation for filter_rank_genes_groups #1391 S Rybakov
Fixed bug where
score_genes
would error if one gene was passed #1398 I VirshupFixed
log1p
inplace on integer dense arrays #1400 I VirshupFix docstring formatting for
rank_genes_groups()
#1417 P WeilerRemoved
PendingDeprecationWarning`s from use of `np.matrix
#1424 P WeilerFixed indexing byg in
~scanpy.pp.highly_variable_genes
#1456 V BergenFix default number of genes for marker_genes_overlap #1464 MD Luecken
Fixed passing
groupby
anddendrogram_key
todendrogram()
#1465 M VarmaFixed download path of
pbmc3k_processed
#1472 D StroblBetter error message when computing DE with a group of size 1 #1490 J Manning
Update cugraph API usage for v0.16 #1494 R Ilango
Fixed
marker_gene_overlap
default value fortop_n_markers
#1464 MD LueckenPass
random_state
to RAPIDs UMAP #1474 C NoletFixed
anndata
version requirement forconcat()
(re-exported from scanpy assc.concat
) #1491 I VirshupFixed the width of the progress bar when downloading data #1507 M Klein
Updated link for
moignard15
dataset #1542 I VirshupFixed bug where calling
set_figure_params
could block if IPython was installed, but not used. #1547 I Virshupviolin()
no longer fails if.raw
not present #1548 I Virshupspatial()
refactoring and better handling of spatial data #1512 G Palla
Version 1.6#
1.6.0 2020-08-15#
This release includes an overhaul of dotplot()
, matrixplot()
, and stacked_violin()
(#1210 F Ramirez), and of the internals of rank_genes_groups()
(#1156 S Rybakov).
Overhaul of dotplot()
, matrixplot()
, and stacked_violin()
#1210 F Ramirez#
An overhauled tutorial Core plotting functions.
New plotting classes can be accessed directly (e.g.,
DotPlot
) or using thereturn_fig
param.It is possible to plot log fold change and p-values in the
rank_genes_groups_dotplot()
family of functions.Added
ax
parameter which allows embedding the plot in other images.Added option to include a bar plot instead of the dendrogram containing the cell/observation totals per category.
Return a dictionary of axes for further manipulation. This includes the main plot, legend and dendrogram to totals
Legends can be removed.
The
groupby
param can take a list of categories, e.g.,groupby=[‘tissue’, ‘cell type’]
.Added padding parameter to
dotplot
andstacked_violin
. #1270Added title for colorbar and positioned as in dotplot for
matrixplot()
.dotplot()
changes:Improved the colorbar and size legend for dotplots. Now the colorbar and size have titles, which can be modified using the
colorbar_title
andsize_title
params. They also align at the bottom of the image and do not shrink if the dotplot image is smaller.Allow plotting genes in rows and categories in columns (
swap_axes
).Using
DotPlot
, thedot_edge_color
and line width can be modified, a grid can be added, and other modifications are enabled.A new style was added in which the dots are replaced by an empty circle and the square behind the circle is colored (like in matrixplots).
stacked_violin()
changes:Violin colors can be colored based on average gene expression as in dotplots.
The linewidth of the violin plots is thinner.
Removed the tics for the y-axis as they tend to overlap with each other. Using the style method they can be displayed if needed.
Additions#
concat()
is now exported from scanpy, see Concatenation for more info. #1338 I VirshupAdded highly variable gene selection strategy from Seurat v3 #1204 A Gayoso
Added
backup_url
param toread_10x_h5()
#1296 A GayosoAllow prefix for
read_10x_mtx()
#1250 G SturmOptional tie correction for the
'wilcoxon'
method inrank_genes_groups()
#1330 S RybakovUse
sinfo
forprint_versions()
and addprint_header()
to do what it previously did. #1338 I Virshup #1373
Bug fixes#
Avoid warning in
rank_genes_groups()
if ‘t-test’ is passed #1303 A WolfRestrict sphinx version to <3.1, >3.0 #1297 I Virshup
Clean up
_ranks
and fixdendrogram
for scipy 1.5 #1290 S RybakovUse
.raw
to translate gene symbols if applicable #1278 E RiceFix
diffmap
(#1262) G EraslanFix
neighbors
inspring_project
#1260 S RybakovBumped version requirement of
scipy
toscipy>1.4
to supportrmatmat
argument ofLinearOperator
#1246 I VirshupFix asymmetry of scores for the
'wilcoxon'
method inrank_genes_groups()
#754 S RybakovAvoid trimming of gene names in
rank_genes_groups()
#753 S Rybakov
Version 1.5#
1.5.1 2020-05-21#
Bug fixes#
1.5.0 2020-05-15#
The 1.5.0
release adds a lot of new functionality, much of which takes advantage of anndata
updates 0.7.0 - 0.7.2
. Highlights of this release include support for spatial data, dedicated handling of graphs in AnnData, sparse PCA, an interface with scvi, and others.
Spatial data support#
Tutorials for basic analysis and integration with single cell data G Palla
read_visium()
read 10x Visium data #1034 G Palla, P Angerer, I Virshupvisium_sge()
load Visium data directly from 10x Genomics #1013 M Mirkazemi, G Palla, P Angerer
New functionality#
External tools#
Performance#
pca()
now uses efficient implicit centering for sparse matrices. This can lead to signifigantly improved performance for large datasets #1066 A Tarashanskyscore_genes()
now has an efficient implementation for sparse matrices with missing values #1196 redst4r.
Code design#
stacked_violin()
can now be used as a subplot #1084 P Angererscore_genes()
has improved logging #1119 G Eraslanscale()
now saves mean and standard deviation in thevar
#1173 A Wolfharmony_timeseries()
#1091 A Mousa
Bug fixes#
combat()
now works whenobs_names
aren’t unique. #1215 I Virshupscale()
can now be used on dense arrays without centering #1160 simonwmregress_out()
now works when some features are constant #1194 simonwmnormalize_total()
errored if the passed object was a view #1200 I Virshupneighbors()
sometimes ignored then_pcs
param #1124 V Bergenebi_expression_atlas()
which contained some out-of-date URLs #1102 I Virshuphighly_variable_genes()
which could lead to incorrect results when thebatch_key
argument was used #1180 G Eraslaningest()
where an inconsistent number of neighbors was used #1111 S Rybakov
Version 1.4#
1.4.6 2020-03-17#
Functionality in external
#
sam()
self-assembling manifolds [Tarashansky et al., 2019] #903 A Tarashanskyharmony_timeseries()
for trajectory inference on discrete time points #994 A Mousawishbone()
for trajectory inference (bifurcations) #1063 A Mousa
Code design#
Bug fixes#
1.4.5 2019-12-30#
Please install scanpy==1.4.5.post3
instead of scanpy==1.4.5
.
New functionality#
ingest()
maps labels and embeddings of reference data to new data Integrating data using ingest and BBKNN #651 S Rybakov, A Wolfqueries
recieved many updates including enrichment through gprofiler and more advanced biomart queries #467 I Virshupset_figure_params()
allows settingfigsize
and acceptsfacecolor='white'
, useful for working in dark mode A Wolf
Code design#
downsample_counts
now always preserves the dtype of it’s input, instead of converting floats to ints #865 I Virshuprun neighbors on a GPU using rapids #830 T White
param docs from typed params P Angerer
embedding_density()
now only takes one positional argument; similar forembedding_density()
, which gains a paramgroupby
#965 A Wolfwebpage overhaul, ecosystem page, release notes, tutorials overhaul #960 #966 A Wolf
Warning
changed default
solver
inpca()
fromauto
toarpack
changed default
use_raw
inscore_genes()
fromFalse
toNone
1.4.4 2019-07-20#
New functionality#
scanpy.get
adds helper functions for extracting data in convenient formats #619 I Virshup
Bug fixes#
Stopped deprecations warnings from AnnData
0.6.22
I Virshup
Code design#
normalize_total()
gains paramexclude_highly_expressed
, andfraction
is renamed tomax_fraction
with better docs A Wolf
1.4.3 2019-05-14#
Bug fixes#
neighbors()
correctly infersn_neighbors
again fromparams
, which was temporarily broken inv1.4.2
I Virshup
Code design#
calculate_qc_metrics()
is single threaded by default for datasets under 300,000 cells – allowing cached compilation #615 I Virshup
1.4.2 2019-05-06#
New functionality#
combat()
supports additional covariates which may include adjustment variables or biological condition #618 G Eraslanhighly_variable_genes()
has abatch_key
option which performs HVG selection in each batch separately to avoid selecting genes that vary strongly across batches #622 G Eraslan
Bug fixes#
rank_genes_groups()
t-test implementation doesn’t return NaN when variance is 0, also changed to scipy’s implementation #621 I Virshupumap()
withinit_pos='paga'
detects correctdtype
A Wolflouvain()
andleiden()
auto-generatekey_added=louvain_R
upon passingrestrict_to
, which was temporarily changed in1.4.1
A Wolf
Code design#
neighbors()
andumap()
got rid of UMAP legacy code and introduced UMAP as a dependency #576 S Rybakov
1.4.1 2019-04-26#
New functionality#
Scanpy has a command line interface again. Invoking it with
scanpy somecommand [args]
callsscanpy-somecommand [args]
, except for builtin commands (currentlyscanpy settings
) #604 P Angererebi_expression_atlas()
allows convenient download of EBI expression atlas I Virshupmarker_gene_overlap()
computes overlaps of marker genes M Lueckenfilter_rank_genes_groups()
filters out genes based on fold change and fraction of cells expressing genes F Ramireznormalize_total()
replacesnormalize_per_cell()
, is more efficient and provides a parameter to only normalize using a fraction of expressed genes S Rybakovdownsample_counts()
has been sped up, changed default value ofreplace
parameter toFalse
#474 I Virshupembedding_density()
computes densities on embeddings #543 M Lueckenpalantir()
interfaces Palantir [Setty et al., 2019] #493 A Mousa
Code design#
.layers
support of scatter plots F Ramirezfix double-logarithmization in compute of log fold change in
rank_genes_groups()
A Muñoz-Rojasfix return sections of docs P Angerer
Version 1.3#
1.3.8 2019-02-05#
various documentation and dev process improvements
Added
combat()
function for batch effect correction [Johnson et al., 2006, Leek et al., 2017, Pedersen, 2012] #398 M Lange
1.3.7 2019-01-02#
API changed from
import scanpy as sc
toimport scanpy.api as sc
.phenograph()
wraps the graph clustering package Phenograph [Levine et al., 2015] thanks to A Mousa
1.3.6 2018-12-11#
Major updates#
a new plotting gallery for
visualizing-marker-genes
F Ramireztutorials are integrated on ReadTheDocs,
pbmc3k
andpaga-paul15
A Wolf
Interactive exploration of analysis results through manifold viewers#
CZI’s cellxgene directly reads
.h5ad
files the cellxgene developersthe UCSC Single Cell Browser requires exporting via
cellbrowser()
M Haeussler
Code design#
highly_variable_genes()
supersedesfilter_genes_dispersion()
, it gives the same results but, by default, expects logarithmized data and doesn’t subset A Wolf
1.3.5 2018-12-09#
uncountable figure improvements #369 F Ramirez
1.3.4 2018-11-24#
leiden()
wraps the recent graph clustering package by Traag et al. [2019] K Polanskibbknn()
wraps the recent batch correction package [Polański et al., 2019] K Polanskicalculate_qc_metrics()
caculates a number of quality control metrics, similar tocalculateQCMetrics
from Scater [McCarthy et al., 2017] I Virshup
1.3.3 2018-11-05#
Major updates#
a fully distributed preprocessing backend T White and the Laserson Lab
Code design#
read_10x_h5()
andread_10x_mtx()
read Cell Ranger 3.0 outputs #334 Q Gong
Note
Also see changes in anndata 0.6.
changed default compression to
None
inwrite_h5ad()
to speed up read and write, disk space use is usually less criticalperformance gains in
write_h5ad()
due to better handling of strings and categories S Rybakov
1.3.1 2018-09-03#
RNA velocity in single cells [La Manno et al., 2018]#
Scanpy and AnnData support loom’s layers so that computations for single-cell RNA velocity [La Manno et al., 2018] become feasible S Rybakov and V Bergen
scvelo harmonizes with Scanpy and is able to process loom files with splicing information produced by Velocyto [La Manno et al., 2018], it runs a lot faster than the count matrix analysis of Velocyto and provides several conceptual developments
Plotting (Generic)#
There now is a section on imputation in external:#
magic()
for imputation using data diffusion [van Dijk et al., 2018] #187 S Gigantedca()
for imputation and latent space construction using an autoencoder [Eraslan et al., 2019] #186 G Eraslan
Version 1.2#
1.2.1 2018-06-08#
Plotting of Generic marker genes and quality control.#
highest_expr_genes()
for quality control; plot genes with highest mean fraction of cells, similar toplotQC
of Scater [McCarthy et al., 2017] #169 F Ramirez
1.2.0 2018-06-08#
Version 1.1#
1.1.0 2018-06-01#
set_figure_params()
by default passesvector_friendly=True
and allows you to produce reasonablly sized pdfs by rasterizing large scatter plots A Wolfdraw_graph()
defaults to the ForceAtlas2 layout [Chippada, 2018, Jacomy et al., 2014], which is often more visually appealing and whose computation is much faster S Wollockscatter()
also plots along variables axis MD Lueckenregress_out()
is back to multiprocessing F Ramirezread()
reads compressed text files G Eraslanmitochondrial_genes()
for querying mito genes FG Brundumnn_correct()
for batch correction [Haghverdi et al., 2018, Kang, 2018]phate()
for low-dimensional embedding [Moon et al., 2019] S Gigantesandbag()
,cyclone()
for scoring genes [Fechtner, 2018, Scialdone et al., 2015]
Version 1.0#
1.0.0 2018-03-30#
Major updates#
Scanpy is much faster and more memory efficient: preprocess, cluster and visualize 1.3M cells in 6h, 130K cells in 14min, and 68K cells in 3min A Wolf
the API gained a preprocessing function
neighbors()
and a classNeighbors()
to which all basic graph computations are delegated A Wolf
Warning
Upgrading to 1.0 isn’t fully backwards compatible in the following changes
the graph-based tools
louvain()
dpt()
draw_graph()
umap()
diffmap()
paga()
require prior computation of the graph:sc.pp.neighbors(adata, n_neighbors=5); sc.tl.louvain(adata)
instead of previouslysc.tl.louvain(adata, n_neighbors=5)
install
numba
viaconda install numba
, which replaces cythonthe default connectivity measure (dpt will look different using default settings) changed. setting
method='gauss'
insc.pp.neighbors
uses gauss kernel connectivities and reproduces the previous behavior, see, for instance in the example paul15.namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore
replace occurances of
group_by
withgroupby
(consistency withpandas
)it is worth checking out the notebook examples to see changes, e.g. the seurat example.
upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different
Further updates#
UMAP [McInnes et al., 2018] can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is also used for measuring connectivities and computing neighbors, see
neighbors()
A Wolfgraph abstraction: AGA is renamed to PAGA:
paga()
; now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately vialouvain()
anddpt()
, the connectivity measure has been improved A Wolflogistic regression for finding marker genes
rank_genes_groups()
with parametermethod='logreg'
A Wolflouvain()
provides a better implementation for reclustering viarestrict_to
A Wolfscanpy no longer modifies rcParams upon import, call
settings.set_figure_params
to set the ‘scanpy style’ A Wolfdefault cache directory is
./cache/
, setsettings.cachedir
to change this; nested directories in this are avoided A Wolfshow edges in scatter plots based on graph visualization
draw_graph()
andumap()
by passingedges=True
A Wolfdownsample_counts()
for downsampling counts MD Lueckendefault
'louvain_groups'
are called'louvain'
A Wolf'X_diffmap'
contains the zero component, plotting remains unchanged A Wolf
Version 0.4#
0.4.4 2018-02-26#
embed cells using
umap()
[McInnes et al., 2018] #92 G Eraslanscore sets of genes, e.g. for cell cycle, using
score_genes()
[Satija et al., 2015]: notebook
0.4.3 2018-02-09#
clustermap()
: heatmap from hierarchical clustering, based onseaborn.clustermap()
[Waskom et al., 2016] A Wolfonly return
matplotlib.axes.Axes
in plotting functions ofsc.pl
whenshow=False
, otherwiseNone
A Wolf
0.4.2 2018-01-07#
amendments in PAGA and its plotting functions A Wolf
0.4.0 2017-12-23#
export to SPRING [Weinreb et al., 2017] for interactive visualization of data: spring tutorial S Wollock
Version 0.3#
0.3.2 2017-11-29#
finding marker genes via
rank_genes_groups_violin()
improved, see #51 F Ramirez
0.3.0 2017-11-16#
AnnData
gains methodconcatenate()
A WolfAnnData
is available as the separate anndata package P Angerer, A Wolfresults of PAGA simplified A Wolf
Version 0.2#
0.2.9 2017-10-25#
Initial release of the new trajectory inference method PAGA#
paga()
computes an abstracted, coarse-grained (PAGA) graph of the neighborhood graph A Wolfpaga_compare()
plot this graph next an embedding A Wolfpaga_path()
plots a heatmap through a node sequence in the PAGA graph A Wolf
0.2.1 2017-07-24#
Scanpy includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells. A Wolf, P Angerer
Version 0.1#
0.1.0 2017-05-17#
Scanpy computationally outperforms and allows reproducing both the Cell Ranger R kit’s and most of Seurat’s clustering workflows. A Wolf, P Angerer