Release notes#
Version 1.11#
1.11.0.dev165+g751eafac 2024-11-19#
Documentation#
Improve
harmony_integrate()
docs D Kühl (pr3362)
Features#
Add
layer
argument toscanpy.tl.score_genes()
andscanpy.tl.score_genes_cell_cycle()
L Zappia (pr2921)Prevent
raw
conflict withlayer
inscore_genes()
S Dicks (pr3155)Add support for
median
as an aggregation function to theAggregation
class inscanpy.get._aggregated.py
. This allows for median-based aggregation of data (e.g., pseudobulk), complementing existing methods like mean- and sum-based aggregation M Dehkordi (Farhad) (pr3180)Add
key_added
argument topca()
,tsne()
andumap()
P Angerer (pr3184)Support running
scanpy.pp.pca()
on sparse Dask arrays with the'covariance_eigh'
solver P Angerer (pr3263)Use upstreamed
PCA
implementation forcsr_array
andcsr_matrix
(see Version 1.4.0) P Angerer (pr3267)Add explicit support to
scanpy.pp.pca()
forsvd_solver='covariance_eigh'
P Angerer (pr3296)Add support
dask.array.Array
toscanpy.pp.calculate_qc_metrics()
I Gold (pr3307)Support
layer
parameter inscanpy.pl.highest_expr_genes()
P Angerer (pr3324)Run numba functions single-threaded when called from inside of a ThreadPool P Angerer (pr3335)
Performance#
Speed up
regress_out()
P Ashish, P Angerer & S Dicks (pr3284)
Version 1.10#
1.10.4 2024-11-12#
Breaking changes#
Remove Python 3.9 support P Angerer (pr3283)
Bug fixes#
Fix
scanpy.pl.DotPlot.style()
,scanpy.pl.MatrixPlot.style()
, andscanpy.pl.StackedViolin.style()
resetting all non-specified parameters P Angerer (pr3206)Accept
'group'
instead of'obs'
forstandard_scale
parameter instacked_violin()
P Angerer (pr3243)Use
density_norm
instead of ofscale
(cont. from pr2844) inviolin()
andstacked_violin()
P Angerer (pr3244)Switched all compatibility adapters for positional parameters to
FutureWarning
P Angerer (pr3264)Catch
PerfectSeparationWarning
duringregress_out()
J Wagner (pr3275)Fix
scanpy.pp.highly_variable_genes()
for batches of size 1 P Angerer (pr3286)Fix
scanpy.pl.scatter()
’scolor
parameter to take collections as advertised P Angerer (pr3299)Fix
scanpy.pl.highest_expr_genes()
when used with a categorical gene symbol column P Angerer (pr3302)
1.10.3 2024-09-17#
Bug fixes#
Prevent empty control gene set in
score_genes()
M Müller (pr2875)Fix
subset=True
ofhighly_variable_genes()
whenflavor
isseurat
orcell_ranger
, andbatch_key!=None
E Roellin (pr3042)Add compatibility with
numpy
2.0 P Angerer pr3065 and (pr3115)Fix
legend_loc
argument inscanpy.pl.embedding()
not accepting matplotlib parameters P Angerer (pr3163)Fix dispersion cutoff in
highly_variable_genes()
in presence ofNaN
s P Angerer (pr3176)Fix axis labeling for swapped axes in
rank_genes_groups_stacked_violin()
Ilan Gold (pr3196)Upper bound dask on account of issuescverse/anndata#1579 Ilan Gold (pr3217)
The fa2-modified package replaces forceatlas2 for the latter’s lack of maintenance A Alam (pr3220)
1.10.2 2024-06-25#
Development Process#
Add performance benchmarking pr2977 R Shrestha, P Angerer
Documentation#
Bug fixes#
Compatibility with
matplotlib
3.9 pr2999 I VirshupAdd clear errors where
backed
mode-like matrices (i.e., fromsparse_dataset
) are not supported pr3048 I goldWrite out full pca results when
_choose_representation
is called i.e.,neighbors()
withoutpca()
pr3078 I goldFix deprecated use of
.A
with sparse matrices pr3084 P AngererFix zappy support pr3089 P Angerer
Performance#
1.10.1 2024-04-09#
Documentation#
Added how-to example on plotting with Marsilea pr2974 Y Zheng
Bug fixes#
Fix
aggregate
when aggregating by more than two groups pr2965 I Virshup
Performance#
1.10.0 2024-03-26#
scanpy
1.10 brings a large amount of new features, performance improvements, and improved documentation.
Some highlights:
Improved support for out-of-core workflows via
dask
. See new tutorial: Using dask with Scanpy demonstrating counts-to-clusters for 1.4 million cells in <10 min.A new basic clustering tutorial demonstrating an updated workflow.
Opt-in increased performance for neighbor search and clustering (how to guide).
Ability to
mask
observations or variables from a number of methods (see Customizing Scanpy plots for an example with plotting embeddings)A new function
aggregate()
for computing aggregations of your data, very useful for pseudo bulking!
Features#
scrublet()
andscrublet_simulate_doublets()
were moved fromscanpy.external.pp
toscanpy.pp
. Thescrublet
implementation is now maintained as part of scanpy pr2703 P Angererscanpy.pp.pca()
,scanpy.pp.scale()
,scanpy.pl.embedding()
, andscanpy.experimental.pp.normalize_pearson_residuals_pca()
now support amask
parameter pr2272 C Bright, T Marcella, & P AngererEnhanced dask support for some internal utilities, paving the way for more extensive dask support pr2696 P Angerer
scanpy.pp.highly_variable_genes()
supports dask for the defaultseurat
andcell_ranger
flavors pr2809 P AngererNew function
scanpy.get.aggregate()
which allows grouped aggregations over your data. Useful for pseudobulking! pr2590 Isaac Virshup Ilan Gold Jon Bloomscanpy.pp.neighbors()
now has atransformer
argument allowing the use of different ANN/ KNN libraries pr2536 P Angererscanpy.experimental.pp.highly_variable_genes()
usingflavor='pearson_residuals'
now uses numba for variance computation and is faster pr2612 S Dicks & P Angererscanpy.tl.leiden()
now offersigraph
’s implementation of the leiden algorithm via viaflavor
when set toigraph
.leidenalg
’s implementation is still default, but discouraged. pr2815 I Goldscanpy.pp.highly_variable_genes()
has new flavorseurat_v3_paper
that is in its implementation consistent with the paper description in Stuart et al 2018. pr2792 E Roellinscanpy.datasets.blobs()
now accepts arandom_state
argument pr2683 E Roellinscanpy.pp.pca()
andscanpy.pp.regress_out()
now accept a layer argument pr2588 S Dicksscanpy.pp.subsample()
withcopy=True
can now be called in backed mode pr2624 E Roellinscanpy.external.pp.harmony_integrate()
now runs with 64 bit floats improving reproducibility pr2655 S Dicksscanpy.tl.rank_genes_groups()
no longer warns that it’s default was changed from t-test_overestim_var to t-test pr2798 L Heumosscanpy.pp.calculate_qc_metrics
now allowsqc_vars
to be passed as a string pr2859 N Teyssierscanpy.tl.leiden()
andscanpy.tl.louvain()
now store clustering parameters in the key provided by thekey_added
parameter instead of always writing to (or overwriting) a default key pr2864 J Fanscanpy.pp.scale()
now clipsnp.ndarray
also at- max_value
for zero-centering pr2913 S DicksSupport sparse chunks in dask
scale()
,normalize_total()
andhighly_variable_genes()
(seurat
andcell-ranger
tested) pr2856 ilan-gold
Documentation#
Doc style overhaul pr2220 A Gayoso
Re-add search-as-you-type, this time via
readthedocs-sphinx-search
pr2805 P AngererFixed a lot of broken usage examples pr2605 P Angerer
Improved harmonization of return field of
sc.pp
andsc.tl
functions pr2742 E RoellinImproved docs for
percent_top
argument ofcalculate_qc_metrics()
pr2849 I VirshupNew basic clustering tutorial (Preprocessing and clustering), based on one from scverse-tutorials pr2901 I Virshup
Overhauled Tutorials page, and added new How to section to docs pr2901 I Virshup
Added a new tutorial on working with dask (Using dask with Scanpy) pr2901 I Gold I Virshup
Bug fixes#
Updated
read_visium()
such that it can read spaceranger 2.0 files L LehnerFix
normalize_total()
for dask pr2466 P AngererFix setting
sc.settings.verbosity
in some cases pr2605 P AngererFix all remaining pandas warnings pr2789 P Angerer
Fix some annoying plotting warnings around violin plots pr2844 P Angerer
Scanpy now has a test job which tests against the minumum versions of the dependencies. In the process of implementing this, many bugs associated with using older versions of
pandas
,anndata
,numpy
, andmatplotlib
were fixed. pr2816 I VirshupFix warnings caused by internal usage of
pandas.DataFrame.stack
withpandas>=2.1
pr2864I Virshupscanpy.get.aggregate()
now always returnsnumpy.ndarray
pr2893 S DicksRemoves self from array of neighbors for
use_approx_neighbors = True
inscrublet()
pr2896S DicksCompatibility with scipy 1.13 pr2943 I Virshup
Fix use of
dendrogram()
on highly correlated low precision data pr2928 P AngererFix pytest deprecation warning pr2879 P Angerer
Development Process#
Deprecations#
Dropped support for Python 3.8. More details here. pr2695 P Angerer
Deprecated specifying large numbers of function parameters by position as opposed to by name/keyword in all public APIs. e.g. prefer
sc.tl.umap(adata, min_dist=0.1, spread=0.8)
oversc.tl.umap(adata, 0.1, 0.8)
pr2702 P AngererDropped support for
umap<0.5
for performance reasons. pr2870 P Angerer
Version 1.9#
1.9.8 2024-01-26#
Bug fixes#
Fix handling of numpy array palettes for old numpy versions pr2832 P Angerer
1.9.7 2024-01-25#
Bug fixes#
Fix handling of numpy array palettes (e.g. after write-read cycle) pr2734 P Angerer
Specify correct version of
matplotlib
dependency pr2733 P FisherFix
scanpy.pl.violin()
usage ofseaborn.catplot
pr2739 E RoellinFix
scanpy.pp.highly_variable_genes()
to handle the combinations ofinplace
andsubset
consistently pr2757 E RoellinReplace usage of various deprecated functionality from
anndata
andpandas
pr2678 pr2779 P AngererAllow to use default
n_top_genes
when usingscanpy.pp.highly_variable_genes()
flavor'seurat_v3'
pr2782 P AngererFix
scanpy.read_10x_mtx()
’sgex_only=True
mode pr2801 P Angerer
1.9.6 2023-10-31#
Bug fixes#
Allow
scanpy.pl.scatter()
to accept astr
palette name pr2571 P AngererMake
scanpy.external.tl.palantir()
compatible with palantir >=1.3 pr2672 DJ OttoFix
scanpy.pl.pca()
whenreturn_fig=True
andannotate_var_explained=True
pr2682 J WagnerTemp fix for issue2680 by skipping
seaborn
version 0.13.0 pr2661 P AngererFix
scanpy.pp.highly_variable_genes()
to not modify the used layer whenflavor=seurat
pr2698 E RoellinPrevent pandas from causing infinite recursion when setting a slice of a categorical column pr2719 P Angerer
1.9.5 2023-09-08#
Bug fixes#
Remove use of deprecated
dtype
argument to AnnData constructor pr2658 Isaac Virshup
1.9.4 2023-08-24#
Bug fixes#
Support scikit-learn 1.3 pr2515 P Angerer
Deal with
None
value vanishing from things like.uns['log1p']
pr2546 SP ShenDepend on
igraph
instead ofpython-igraph
pr2566 P Angererrank_genes_groups()
now handles unsorted groups as intended pr2589 S Dicksrank_genes_groups_df()
now works forrank_genes_groups()
withmethod="logreg"
pr2601 S Dicksscanpy.tl._utils._choose_representation
now works withn_pcs
if bigger thansettings.N_PCS
pr2610 S Dicks
1.9.3 2023-03-02#
Bug fixes#
Variety of fixes against pandas 2.0.0rc0 pr2434 I Virshup
1.9.2 2023-02-16#
Bug fixes#
highly_variable_genes()
layer
argument now works in tandem withbatches
pr2302 D Schaumonthighly_variable_genes()
withflavor='cell_ranger'
now handles the case in issue2230 where the number of calculated dispersions is less thann_top_genes
pr2231 L ZappiaFix compatibility with matplotlib 3.7 pr2414 I Virshup P Fisher
Fix scrublet numpy matrix compatibility issue pr2395 A Gayoso
1.9.1 2022-04-05#
Bug fixes#
normalize_total()
works when Dask is not installed pr2209 R CannoodtFix embedding plots by bumping matplotlib dependency to version 3.4 pr2212 I Virshup
1.9.0 2022-04-01#
Tutorials#
New tutorial on the usage of Pearson Residuals: How to preprocess UMI count data with analytic Pearson residuals J Lause, G Palla
Materials and recordings for Scanpy workshops by Maren Büttner
Experimental module#
Added
scanpy.experimental
module! Currently contains functionality related to pearson residuals inscanpy.experimental.pp
pr1715 J Lause, G Palla, I Virshup. This includes:normalize_pearson_residuals()
for Pearson Residuals normalizationhighly_variable_genes()
for HVG selection with Pearson Residualsnormalize_pearson_residuals_pca()
for Pearson Residuals normalization and dimensionality reduction with PCArecipe_pearson_residuals()
for Pearson Residuals normalization, HVG selection and dimensionality reduction with PCA
Features#
filter_rank_genes_groups()
now allows to filter with absolute values of log fold change pr1649 S Rybakov_choose_representation
now subsets the provided representation to n_pcs, regardless of the name of the provided representation (should affect mostlyneighbors()
) pr2179 I Virshup PG Majevscanpy.pp.scrublet()
(and related functions) can now be used onAnnData
objects containing multiple batches pr1965 J ManningNumber of variables plotted with
pca_loadings()
can now be controlled withn_points
argument. Additionally, variables are no longer repeated if the anndata has less than 30 variables pr2075 Yves33Dask arrays now work with
scanpy.pp.normalize_total()
pr1663 G Buckley, I Virshupembedding_density()
now allows more than 10 groups pr1936 A WolfEmbedding plots can now pass
colorbar_loc
to specify the location of colorbar legend, or passNone
to not show a colorbar pr1821 A Schaar I VirshupEmbedding plots now have a
dimensions
argument, which lets users select which dimensions of their embedding to plot and uses the same broadcasting rules as other arguments pr1538 I Virshupprint_versions()
now usessession_info
pr2089 P Angerer I Virshup
Ecosystem#
Multiple packages have been added to our ecosystem page, including:
Bug fixes#
Fixed finding variables with
use_raw=True
andbasis=None
inscanpy.pl.scatter()
pr2027 E RiceFixed
scanpy.pp.scrublet()
to address issue1957 FlMai and ensure raw counts are used for simulationFunctions in
scanpy.datasets
no longer throwOldFormatWarnings
when usinganndata
0.8
pr2096 I VirshupFixed use of
scanpy.pp.neighbors()
withmethod='rapids'
: RAPIDS cuML no longer returns a squared Euclidean distance matrix, so we should not square-root the kNN distance matrix. pr1828 M ZaslavskyRemoved
pytables
dependency by implementingread_10x_h5
withh5py
due to installation errors on Windows pr2064Fixed bug in
scanpy.external.pp.hashsolo()
where default value was set improperly pr2190 B ReizFixed bug in
scanpy.pl.embedding()
functions where an error could be raised when there were missing values and large numbers of categories pr2187 I Virshup
Version 1.8#
1.8.2 2021-11-3#
Documentation#
Update conda installation instructions pr1974 L Heumos
Bug fixes#
Fix plotting after
scanpy.tl.filter_rank_genes_groups()
pr1942 S RybakovFix
use_raw=None
usinganndata.AnnData.var_names
ifanndata.AnnData.raw
is present inscanpy.tl.score_genes()
pr1999 M KleinFix compatibility with UMAP 0.5.2 pr2028 L Mcinnes
Fixed non-determinism in
scanpy.pl.paga()
node positions pr1922 I Virshup
Ecosystem#
Added PASTE (a tool to align and integrate spatial transcriptomics data) to scanpy ecosystem.
1.8.1 2021-07-07#
Bug fixes#
Fixed reproducibility of
scanpy.tl.score_genes()
. Calculation and output is now float64 type. pr1890 I KucinskiWorkarounds for some changes/ bugs in pandas 1.3 pr1918 I Virshup
Fixed bug where
sc.pl.paga_compare
could mislabel nodes on the paga graph pr1898 I VirshupFixed handling of
use_raw
withscanpy.tl.rank_genes_groups()
pr1934 I Virshup
1.8.0 2021-06-28#
Metrics module#
Added
scanpy.metrics
module!Added
scanpy.metrics.gearys_c()
for spatial autocorrelation pr915 I VirshupAdded
scanpy.metrics.morans_i()
for global spatial autocorrelation pr1740 I Virshup, G PallaAdded
scanpy.metrics.confusion_matrix()
for comparing labellings pr915 I Virshup
Features#
Added
layer
andcopy
kwargs tonormalize_total()
pr1667 I VirshupAdded
vcenter
andnorm
arguments to the plotting functions pr1551 G EraslanStandardized and expanded available arguments to the
sc.pl.rank_genes_groups*
family of functions. pr1529 F Ramirez I VirshupSee examples sections of
rank_genes_groups_dotplot()
andrank_genes_groups_matrixplot()
for demonstrations.
scanpy.tl.tsne()
now supports the metric argument and records the passed parameters pr1854 I Virshupscanpy.pl.scrublet_score_distribution()
now uses same API as other scanpy functions for saving/ showing plots pr1741 J Manning
Ecosystem#
Documentation#
Added rendered examples to many plotting functions issue1664 A Schaar L Zappia bio-la L Hetzel L Dony M Buttner K Hrovatin F Ramirez I Virshup LouisK92 mayarali
Integrated DocSearch, a find-as-you-type documentation index search. pr1754 P Angerer
Reorganized reference docs pr1753 I Virshup
Clarified docs issues for
neighbors()
,diffmap()
,calculate_qc_metrics()
pr1680 G PallaFixed typos in grouped plot doc-strings pr1877 C Rands
Extended examples for differential expression plotting. pr1529 F Ramirez
See
rank_genes_groups_dotplot()
orrank_genes_groups_matrixplot()
for examples.
Bug fixes#
Fix
scanpy.pl.paga_path()
TypeError
with recent versions of anndata pr1047 P AngererFix detection of whether IPython is running pr1844 I Virshup
Fixed reproducibility of
scanpy.tl.diffmap()
(added random_state) pr1858 I KucinskiFixed errors and warnings from embedding plots with small numbers of categories after
sns.set_palette
was called pr1886 I VirshupFixed handling of
gene_symbols
argument in a number ofsc.pl.rank_genes_groups*
functions pr1529 F Ramirez I VirshupFixed handling of
use_raw
forsc.tl.rank_genes_groups
when no.raw
is present pr1895 I Virshupscanpy.pl.rank_genes_groups_violin()
now works forraw=False
pr1669 M van den Beekscanpy.pl.dotplot()
now usessmallest_dot
argument correctly pr1771 S Flemming
Development Process#
Switched to flit for building and deploying the package, a simple tool with an easy to understand command line interface and metadata pr1527 P Angerer
Use pre-commit for style checks pr1684 pr1848 L Heumos I Virshup
Deprecations#
Dropped support for Python 3.6. More details here. pr1897 I Virshup
Deprecated
layers
andlayers_norm
kwargs tonormalize_total()
pr1667 I VirshupDeprecated
MulticoreTSNE
backend forscanpy.tl.tsne()
pr1854 I Virshup
Version 1.7#
1.7.2 2021-04-07#
Bug fixes#
scanpy.logging.print_versions()
now works whenpython<3.8
pr1691 I Virshupscanpy.pp.regress_out()
now usesjoblib
as the parallel backend, and should stop oversubscribing threads pr1694 I Virshupscanpy.pp.highly_variable_genes()
withflavor="seurat_v3"
now returns correct gene means and -variances when used withbatch_key
pr1732 J Lausescanpy.pp.highly_variable_genes()
now throws a warning instead of an error when non-integer values are passed for method"seurat_v3"
. The check can be skipped by passingcheck_values=False
. pr1679 G Palla
Ecosystem#
1.7.1 2021-02-24#
Documentation#
More twitter handles for core devs pr1676 G Eraslan
Bug fixes#
dendrogram()
use1 - correlation
as distance matrix to compute the dendrogram pr1614 F RamirezFixed
obs_df()
/var_df()
erroring whenkeys
not passed pr1637 I VirshupFixed argument handling for
scanpy.pp.scrublet()
J ManningFixed passing of
kwargs
toscanpy.pl.violin()
whenstripplot
was also used pr1655 M van den BeekFixed colorbar creation in
scanpy.pl.timeseries_as_heatmap
pr1654 M van den Beek
1.7.0 2021-02-03#
Features#
Add new 10x Visium datasets to
visium_sge()
pr1473 G PallaEnable download of source image for 10x visium datasets in
visium_sge()
pr1506 H SpitzerRefactor of
scanpy.pl.spatial()
. Better support for plotting without an image, as well as directly providing images pr1512 G PallaDict input for
scanpy.queries.enrich()
pr1488 G Eraslanrank_genes_groups_df()
can now return fraction of cells in a group expressing a gene, and allows retrieving values for multiple groups at once pr1388 G EraslanColor annotations for gene sets in
heatmap()
are now matched to color for cluster pr1511 L SikkemaPCA plots can now annotate axes with variance explained pr1470 bfurtwa
Plots with
groupby
arguments can now group by values in the index by passing the index’s name (likepd.DataFrame.groupby
). pr1583 F RamirezAdded
na_color
andna_in_legend
keyword arguments toembedding()
plots. Allows specifying color for missing or filtered values in plots likeumap()
orspatial()
pr1356 I Virshupembedding()
plots now support passingdict
of{cluster_name: cluster_color, ...}
for palette argument pr1392 I Virshup
External tools (new)#
Add Scanorama integration to scanpy external API (
scanorama_integrate()
, Hie et al. [2019]) pr1332 B HieScrublet [Wolock et al., 2019] integration:
scrublet()
,scrublet_simulate_doublets()
, and plotting methodscrublet_score_distribution()
pr1476 J Manninghashsolo()
for HTO demultiplexing [Bernstein et al., 2020] pr1432 NJ BernsteinAdded scirpy (sc-AIRR analysis) to ecosystem page pr1453 G Sturm
Added scvi-tools to ecosystem page pr1421 A Gayoso
External tools (changes)#
Updates for
palantir()
andpalantir_results()
pr1245 A MousaFixes to
harmony_timeseries()
docs pr1248 A MousaSupport for
leiden
clustering byscanpy.external.tl.phenograph()
pr1080 A MousaDeprecate
scanpy.external.pp.scvi
pr1554 G XingUpdated default params of
sam()
to work with larger data pr1540 A Tarashansky
Documentation#
New contribution guide pr1544 I Virshup
zsh
installation instructions pr1444 P Angerer
Performance#
Speed up
read_10x_h5()
pr1402 P Weiler
Bugfixes#
Consistent fold-change, fractions calculation for filter_rank_genes_groups pr1391 S Rybakov
Fixed bug where
score_genes
would error if one gene was passed pr1398 I VirshupFixed
log1p
inplace on integer dense arrays pr1400 I VirshupFix docstring formatting for
rank_genes_groups()
pr1417 P WeilerRemoved
PendingDeprecationWarning`s from use of `np.matrix
pr1424 P WeilerFixed indexing byg in
~scanpy.pp.highly_variable_genes
pr1456 V BergenFix default number of genes for marker_genes_overlap pr1464 MD Luecken
Fixed passing
groupby
anddendrogram_key
todendrogram()
pr1465 M VarmaFixed download path of
pbmc3k_processed
pr1472 D StroblBetter error message when computing DE with a group of size 1 pr1490 J Manning
Update cugraph API usage for v0.16 pr1494 R Ilango
Fixed
marker_gene_overlap
default value fortop_n_markers
pr1464 MD LueckenPass
random_state
to RAPIDs UMAP pr1474 C NoletFixed
anndata
version requirement forconcat()
(re-exported from scanpy assc.concat
) pr1491 I VirshupFixed the width of the progress bar when downloading data pr1507 M Klein
Updated link for
moignard15
dataset pr1542 I VirshupFixed bug where calling
set_figure_params
could block if IPython was installed, but not used. pr1547 I Virshupviolin()
no longer fails if.raw
not present pr1548 I Virshupspatial()
refactoring and better handling of spatial data pr1512 G Palla
Version 1.6#
1.6.0 2020-08-15#
This release includes an overhaul of dotplot()
, matrixplot()
, and stacked_violin()
(pr1210 F Ramirez), and of the internals of rank_genes_groups()
(pr1156 S Rybakov).
Overhaul of dotplot()
, matrixplot()
, and stacked_violin()
pr1210 F Ramirez#
An overhauled tutorial Core plotting functions.
New plotting classes can be accessed directly (e.g.,
DotPlot
) or using thereturn_fig
param.It is possible to plot log fold change and p-values in the
rank_genes_groups_dotplot()
family of functions.Added
ax
parameter which allows embedding the plot in other images.Added option to include a bar plot instead of the dendrogram containing the cell/observation totals per category.
Return a dictionary of axes for further manipulation. This includes the main plot, legend and dendrogram to totals
Legends can be removed.
The
groupby
param can take a list of categories, e.g.,groupby=[‘tissue’, ‘cell type’]
.Added padding parameter to
dotplot
andstacked_violin
. pr1270Added title for colorbar and positioned as in dotplot for
matrixplot()
.dotplot()
changes:Improved the colorbar and size legend for dotplots. Now the colorbar and size have titles, which can be modified using the
colorbar_title
andsize_title
params. They also align at the bottom of the image and do not shrink if the dotplot image is smaller.Allow plotting genes in rows and categories in columns (
swap_axes
).Using
DotPlot
, thedot_edge_color
and line width can be modified, a grid can be added, and other modifications are enabled.A new style was added in which the dots are replaced by an empty circle and the square behind the circle is colored (like in matrixplots).
stacked_violin()
changes:Violin colors can be colored based on average gene expression as in dotplots.
The linewidth of the violin plots is thinner.
Removed the tics for the y-axis as they tend to overlap with each other. Using the style method they can be displayed if needed.
Additions#
concat()
is now exported from scanpy, see Concatenation for more info. pr1338 I VirshupAdded highly variable gene selection strategy from Seurat v3 pr1204 A Gayoso
Added
backup_url
param toread_10x_h5()
pr1296 A GayosoAllow prefix for
read_10x_mtx()
pr1250 G SturmOptional tie correction for the
'wilcoxon'
method inrank_genes_groups()
pr1330 S RybakovUse
sinfo
forprint_versions()
and addprint_header()
to do what it previously did. pr1338 I Virshup pr1373
Bug fixes#
Avoid warning in
rank_genes_groups()
if ‘t-test’ is passed pr1303 A WolfRestrict sphinx version to <3.1, >3.0 pr1297 I Virshup
Clean up
_ranks
and fixdendrogram
for scipy 1.5 pr1290 S RybakovUse
.raw
to translate gene symbols if applicable pr1278 E RiceFix
diffmap
(issue1262) G EraslanFix
neighbors
inspring_project
issue1260 S RybakovFix default size of dot in spatial plots pr1255 issue1253 giovp
Bumped version requirement of
scipy
toscipy>1.4
to supportrmatmat
argument ofLinearOperator
issue1246 I VirshupFix asymmetry of scores for the
'wilcoxon'
method inrank_genes_groups()
issue754 S RybakovAvoid trimming of gene names in
rank_genes_groups()
issue753 S Rybakov
Version 1.5#
1.5.1 2020-05-21#
Bug fixes#
1.5.0 2020-05-15#
The 1.5.0
release adds a lot of new functionality, much of which takes advantage of anndata
updates 0.7.0 - 0.7.2
. Highlights of this release include support for spatial data, dedicated handling of graphs in AnnData, sparse PCA, an interface with scvi, and others.
Spatial data support#
Basic analysis Analysis and visualization of spatial transcriptomics data and integration with single cell data Integrating spatial data with scRNA-seq using scanorama G Palla
read_visium()
read 10x Visium data pr1034 G Palla, P Angerer, I Virshupvisium_sge()
load Visium data directly from 10x Genomics pr1013 M Mirkazemi, G Palla, P Angerer
New functionality#
External tools#
Performance#
pca()
now uses efficient implicit centering for sparse matrices. This can lead to signifigantly improved performance for large datasets pr1066 A Tarashanskyscore_genes()
now has an efficient implementation for sparse matrices with missing values pr1196 redst4r.
Code design#
stacked_violin()
can now be used as a subplot pr1084 P Angererscore_genes()
has improved logging pr1119 G Eraslanscale()
now saves mean and standard deviation in thevar
pr1173 A Wolfharmony_timeseries()
pr1091 A Mousa
Bug fixes#
combat()
now works whenobs_names
aren’t unique. pr1215 I Virshupscale()
can now be used on dense arrays without centering pr1160 simonwmregress_out()
now works when some features are constant pr1194 simonwmnormalize_total()
errored if the passed object was a view pr1200 I Virshupneighbors()
sometimes ignored then_pcs
param pr1124 V Bergenebi_expression_atlas()
which contained some out-of-date URLs pr1102 I Virshuphighly_variable_genes()
which could lead to incorrect results when thebatch_key
argument was used pr1180 G Eraslaningest()
where an inconsistent number of neighbors was used pr1111 S Rybakov
Version 1.4#
1.4.6 2020-03-17#
Functionality in external
#
sam()
self-assembling manifolds [Tarashansky et al., 2019] pr903 A Tarashanskyharmony_timeseries()
for trajectory inference on discrete time points pr994 A Mousawishbone()
for trajectory inference (bifurcations) pr1063 A Mousa
Code design#
Bug fixes#
1.4.5 2019-12-30#
Please install scanpy==1.4.5.post3
instead of scanpy==1.4.5
.
New functionality#
ingest()
maps labels and embeddings of reference data to new data Integrating data using ingest and BBKNN pr651 S Rybakov, A Wolfqueries
recieved many updates including enrichment through gprofiler and more advanced biomart queries pr467 I Virshupset_figure_params()
allows settingfigsize
and acceptsfacecolor='white'
, useful for working in dark mode A Wolf
Code design#
downsample_counts
now always preserves the dtype of it’s input, instead of converting floats to ints pr865 I Virshuprun neighbors on a GPU using rapids pr830 T White
param docs from typed params P Angerer
embedding_density()
now only takes one positional argument; similar forembedding_density()
, which gains a paramgroupby
pr965 A Wolfwebpage overhaul, ecosystem page, release notes, tutorials overhaul pr960 pr966 A Wolf
Warning
changed default
solver
inpca()
fromauto
toarpack
changed default
use_raw
inscore_genes()
fromFalse
toNone
1.4.4 2019-07-20#
New functionality#
scanpy.get
adds helper functions for extracting data in convenient formats pr619 I Virshup
Bug fixes#
Stopped deprecations warnings from AnnData
0.6.22
I Virshup
Code design#
normalize_total()
gains paramexclude_highly_expressed
, andfraction
is renamed tomax_fraction
with better docs A Wolf
1.4.3 2019-05-14#
Bug fixes#
neighbors()
correctly infersn_neighbors
again fromparams
, which was temporarily broken inv1.4.2
I Virshup
Code design#
calculate_qc_metrics()
is single threaded by default for datasets under 300,000 cells – allowing cached compilation pr615 I Virshup
1.4.2 2019-05-06#
New functionality#
combat()
supports additional covariates which may include adjustment variables or biological condition pr618 G Eraslanhighly_variable_genes()
has abatch_key
option which performs HVG selection in each batch separately to avoid selecting genes that vary strongly across batches pr622 G Eraslan
Bug fixes#
rank_genes_groups()
t-test implementation doesn’t return NaN when variance is 0, also changed to scipy’s implementation pr621 I Virshupumap()
withinit_pos='paga'
detects correctdtype
A Wolflouvain()
andleiden()
auto-generatekey_added=louvain_R
upon passingrestrict_to
, which was temporarily changed in1.4.1
A Wolf
Code design#
neighbors()
andumap()
got rid of UMAP legacy code and introduced UMAP as a dependency pr576 S Rybakov
1.4.1 2019-04-26#
New functionality#
Scanpy has a command line interface again. Invoking it with
scanpy somecommand [args]
callsscanpy-somecommand [args]
, except for builtin commands (currentlyscanpy settings
) pr604 P Angererebi_expression_atlas()
allows convenient download of EBI expression atlas I Virshupmarker_gene_overlap()
computes overlaps of marker genes M Lueckenfilter_rank_genes_groups()
filters out genes based on fold change and fraction of cells expressing genes F Ramireznormalize_total()
replacesnormalize_per_cell()
, is more efficient and provides a parameter to only normalize using a fraction of expressed genes S Rybakovdownsample_counts()
has been sped up, changed default value ofreplace
parameter toFalse
pr474 I Virshupembedding_density()
computes densities on embeddings pr543 M Lueckenpalantir()
interfaces Palantir [Setty et al., 2019] pr493 A Mousa
Code design#
.layers
support of scatter plots F Ramirezfix double-logarithmization in compute of log fold change in
rank_genes_groups()
A Muñoz-Rojasfix return sections of docs P Angerer
Version 1.3#
1.3.8 2019-02-05#
various documentation and dev process improvements
Added
combat()
function for batch effect correction [Johnson et al., 2006, Leek et al., 2017, Pedersen, 2012] pr398 M Lange
1.3.7 2019-01-02#
API changed from
import scanpy as sc
toimport scanpy.api as sc
.phenograph()
wraps the graph clustering package Phenograph [Levine et al., 2015] thanks to A Mousa
1.3.6 2018-12-11#
Major updates#
a new plotting gallery for
visualizing-marker-genes
F Ramireztutorials are integrated on ReadTheDocs,
pbmc3k
andpaga-paul15
A Wolf
Interactive exploration of analysis results through manifold viewers#
CZI’s cellxgene directly reads
.h5ad
files the cellxgene developersthe UCSC Single Cell Browser requires exporting via
cellbrowser()
M Haeussler
Code design#
highly_variable_genes()
supersedesfilter_genes_dispersion()
, it gives the same results but, by default, expects logarithmized data and doesn’t subset A Wolf
1.3.5 2018-12-09#
uncountable figure improvements pr369 F Ramirez
1.3.4 2018-11-24#
leiden()
wraps the recent graph clustering package by Traag et al. [2019] K Polanskibbknn()
wraps the recent batch correction package [Polański et al., 2019] K Polanskicalculate_qc_metrics()
caculates a number of quality control metrics, similar tocalculateQCMetrics
from Scater [McCarthy et al., 2017] I Virshup
1.3.3 2018-11-05#
Major updates#
a fully distributed preprocessing backend T White and the Laserson Lab
Code design#
read_10x_h5()
andread_10x_mtx()
read Cell Ranger 3.0 outputs pr334 Q Gong
Note
Also see changes in anndata 0.6.
changed default compression to
None
inwrite_h5ad()
to speed up read and write, disk space use is usually less criticalperformance gains in
write_h5ad()
due to better handling of strings and categories S Rybakov
1.3.1 2018-09-03#
RNA velocity in single cells [La Manno et al., 2018]#
Scanpy and AnnData support loom’s layers so that computations for single-cell RNA velocity [La Manno et al., 2018] become feasible S Rybakov and V Bergen
scvelo harmonizes with Scanpy and is able to process loom files with splicing information produced by Velocyto [La Manno et al., 2018], it runs a lot faster than the count matrix analysis of Velocyto and provides several conceptual developments
Plotting (Generic)#
There now is a section on imputation in external:#
magic()
for imputation using data diffusion [van Dijk et al., 2018] pr187 S Gigantedca()
for imputation and latent space construction using an autoencoder [Eraslan et al., 2019] pr186 G Eraslan
Version 1.2#
1.2.1 2018-06-08#
Plotting of Generic marker genes and quality control.#
highest_expr_genes()
for quality control; plot genes with highest mean fraction of cells, similar toplotQC
of Scater [McCarthy et al., 2017] pr169 F Ramirez
1.2.0 2018-06-08#
Version 1.1#
1.1.0 2018-06-01#
set_figure_params()
by default passesvector_friendly=True
and allows you to produce reasonablly sized pdfs by rasterizing large scatter plots A Wolfdraw_graph()
defaults to the ForceAtlas2 layout [Chippada, 2018, Jacomy et al., 2014], which is often more visually appealing and whose computation is much faster S Wollockscatter()
also plots along variables axis MD Lueckenregress_out()
is back to multiprocessing F Ramirezread()
reads compressed text files G Eraslanmitochondrial_genes()
for querying mito genes FG Brundumnn_correct()
for batch correction [Haghverdi et al., 2018, Kang, 2018]phate()
for low-dimensional embedding [Moon et al., 2019] S Gigantesandbag()
,cyclone()
for scoring genes [Fechtner, 2018, Scialdone et al., 2015]
Version 1.0#
1.0.0 2018-03-30#
Major updates#
Scanpy is much faster and more memory efficient: preprocess, cluster and visualize 1.3M cells in 6h, 130K cells in 14min, and 68K cells in 3min A Wolf
the API gained a preprocessing function
neighbors()
and a classNeighbors()
to which all basic graph computations are delegated A Wolf
Warning
Upgrading to 1.0 isn’t fully backwards compatible in the following changes
the graph-based tools
louvain()
dpt()
draw_graph()
umap()
diffmap()
paga()
require prior computation of the graph:sc.pp.neighbors(adata, n_neighbors=5); sc.tl.louvain(adata)
instead of previouslysc.tl.louvain(adata, n_neighbors=5)
install
numba
viaconda install numba
, which replaces cythonthe default connectivity measure (dpt will look different using default settings) changed. setting
method='gauss'
insc.pp.neighbors
uses gauss kernel connectivities and reproduces the previous behavior, see, for instance in the example paul15.namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore
replace occurances of
group_by
withgroupby
(consistency withpandas
)it is worth checking out the notebook examples to see changes, e.g. the seurat example.
upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different
Further updates#
UMAP [McInnes et al., 2018] can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is also used for measuring connectivities and computing neighbors, see
neighbors()
A Wolfgraph abstraction: AGA is renamed to PAGA:
paga()
; now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately vialouvain()
anddpt()
, the connectivity measure has been improved A Wolflogistic regression for finding marker genes
rank_genes_groups()
with parametermethod='logreg'
A Wolflouvain()
provides a better implementation for reclustering viarestrict_to
A Wolfscanpy no longer modifies rcParams upon import, call
settings.set_figure_params
to set the ‘scanpy style’ A Wolfdefault cache directory is
./cache/
, setsettings.cachedir
to change this; nested directories in this are avoided A Wolfshow edges in scatter plots based on graph visualization
draw_graph()
andumap()
by passingedges=True
A Wolfdownsample_counts()
for downsampling counts MD Lueckendefault
'louvain_groups'
are called'louvain'
A Wolf'X_diffmap'
contains the zero component, plotting remains unchanged A Wolf
Version 0.4#
0.4.4 2018-02-26#
embed cells using
umap()
[McInnes et al., 2018] pr92 G Eraslanscore sets of genes, e.g. for cell cycle, using
score_genes()
[Satija et al., 2015]: notebook
0.4.3 2018-02-09#
clustermap()
: heatmap from hierarchical clustering, based onseaborn.clustermap()
[Waskom et al., 2016] A Wolfonly return
matplotlib.axes.Axes
in plotting functions ofsc.pl
whenshow=False
, otherwiseNone
A Wolf
0.4.2 2018-01-07#
amendments in PAGA and its plotting functions A Wolf
0.4.0 2017-12-23#
export to SPRING [Weinreb et al., 2017] for interactive visualization of data: spring tutorial S Wollock
Version 0.3#
0.3.2 2017-11-29#
finding marker genes via
rank_genes_groups_violin()
improved, see issue51 F Ramirez
0.3.0 2017-11-16#
AnnData
gains methodconcatenate()
A WolfAnnData
is available as the separate anndata package P Angerer, A Wolfresults of PAGA simplified A Wolf
Version 0.2#
0.2.9 2017-10-25#
Initial release of the new trajectory inference method PAGA#
paga()
computes an abstracted, coarse-grained (PAGA) graph of the neighborhood graph A Wolfpaga_compare()
plot this graph next an embedding A Wolfpaga_path()
plots a heatmap through a node sequence in the PAGA graph A Wolf
0.2.1 2017-07-24#
Scanpy includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells. A Wolf, P Angerer
Version 0.1#
0.1.0 2017-05-17#
Scanpy computationally outperforms and allows reproducing both the Cell Ranger R kit’s and most of Seurat’s clustering workflows. A Wolf, P Angerer