Release notes

Release notes#

Version 1.11#

1.11.0.dev11+g0cfd0224 2024-12-16#

Bug fixes#

Raise FutureWarning when calling deprecated scanpy.pp functions P Angerer (pr3380)
Upper-bound sklearn <1.6.0 due to issuedask/dask-ml#1002 Ilan Gold (pr3393)

Documentation#

Improve harmony_integrate() docs D Kühl (pr3362)

Performance#

- Speed up regress_out() P Ashish, P Angerer & S Dicks (pr3284)

Version 1.10#

1.10.4 2024-11-12#

Breaking changes#

Remove Python 3.9 support P Angerer (pr3283)

Bug fixes#

Fix scanpy.pl.DotPlot.style(), scanpy.pl.MatrixPlot.style(), and scanpy.pl.StackedViolin.style() resetting all non-specified parameters P Angerer (pr3206)
Accept 'group' instead of 'obs' for standard_scale parameter in stacked_violin() P Angerer (pr3243)
Use density_norm instead of of scale (cont. from pr2844) in violin() and stacked_violin() P Angerer (pr3244)
Switched all compatibility adapters for positional parameters to FutureWarning P Angerer (pr3264)
Catch PerfectSeparationWarning during regress_out() J Wagner (pr3275)
Fix scanpy.pp.highly_variable_genes() for batches of size 1 P Angerer (pr3286)
Fix scanpy.pl.scatter()’s color parameter to take collections as advertised P Angerer (pr3299)
Fix scanpy.pl.highest_expr_genes() when used with a categorical gene symbol column P Angerer (pr3302)

1.10.3 2024-09-17#

Bug fixes#

Prevent empty control gene set in score_genes() M Müller (pr2875)
Fix subset=True of highly_variable_genes() when flavor is seurat or cell_ranger, and batch_key!=None E Roellin (pr3042)
Add compatibility with numpy 2.0 P Angerer pr3065 and (pr3115)
Fix legend_loc argument in scanpy.pl.embedding() not accepting matplotlib parameters P Angerer (pr3163)
Fix dispersion cutoff in highly_variable_genes() in presence of NaNs P Angerer (pr3176)
Fix axis labeling for swapped axes in rank_genes_groups_stacked_violin() Ilan Gold (pr3196)
Upper bound dask on account of issuescverse/anndata#1579 Ilan Gold (pr3217)
The fa2-modified package replaces forceatlas2 for the latter’s lack of maintenance A Alam (pr3220)

1.10.2 2024-06-25#

Development Process#

Add performance benchmarking pr2977 R Shrestha, P Angerer

Documentation#

Document several missing parameters in docstring pr2888 S Cheney
Fixed incorrect instructions in “testing” dev docs pr2994 I Virshup
Update marsilea tutorial to use group_ methods pr3001 I Virshup
Fixed citations pr3032 P Angerer
Improve dataset documentation pr3060 P Angerer

Bug fixes#

Compatibility with matplotlib 3.9 pr2999 I Virshup
Add clear errors where backed mode-like matrices (i.e., from sparse_dataset) are not supported pr3048 I gold
Write out full pca results when _choose_representation is called i.e., neighbors() without pca() pr3079 I gold
Fix deprecated use of .A with sparse matrices pr3084 P Angerer
Fix zappy support pr3089 P Angerer
Fix dotplot group order with pandas 1.x pr3101 P Angerer

Performance#

sparse_mean_variance_axis now uses all cores for the calculations pr3015 S Dicks
pp.highly_variable_genes with flavor=seurat_v3 now uses a numba kernel pr3017 S Dicks
Speed up scrublet() pr3044 S Dicks and pr3056 P Angerer
Speed up clipping of array in scale() pr3100 P Ashish & S Dicks

1.10.1 2024-04-09#

Documentation#

Added how-to example on plotting with Marsilea pr2974 Y Zheng

Bug fixes#

Fix aggregate when aggregating by more than two groups pr2965 I Virshup

Performance#

scale() now uses numba kernels for sparse.csr_matrix and sparse.csc_matrix when zero_center==False and mask_obs is provided. This greatly speed up execution pr2942 S Dicks

1.10.0 2024-03-26#

scanpy 1.10 brings a large amount of new features, performance improvements, and improved documentation.

Some highlights:

Improved support for out-of-core workflows via dask. See new tutorial: Using dask with Scanpy demonstrating counts-to-clusters for 1.4 million cells in <10 min.
A new basic clustering tutorial demonstrating an updated workflow.
Opt-in increased performance for neighbor search and clustering (how to guide).
Ability to mask observations or variables from a number of methods (see Customizing Scanpy plots for an example with plotting embeddings)
A new function aggregate() for computing aggregations of your data, very useful for pseudo bulking!

Features#

scrublet() and scrublet_simulate_doublets() were moved from scanpy.external.pp to scanpy.pp. The scrublet implementation is now maintained as part of scanpy pr2703 P Angerer
scanpy.pp.pca(), scanpy.pp.scale(), scanpy.pl.embedding(), and scanpy.experimental.pp.normalize_pearson_residuals_pca() now support a mask parameter pr2272 C Bright, T Marcella, & P Angerer
Enhanced dask support for some internal utilities, paving the way for more extensive dask support pr2696 P Angerer
scanpy.pp.highly_variable_genes() supports dask for the default seurat and cell_ranger flavors pr2809 P Angerer
New function scanpy.get.aggregate() which allows grouped aggregations over your data. Useful for pseudobulking! pr2590 Isaac Virshup Ilan Gold Jon Bloom
scanpy.pp.neighbors() now has a transformer argument allowing the use of different ANN/ KNN libraries pr2536 P Angerer
scanpy.experimental.pp.highly_variable_genes() using flavor='pearson_residuals' now uses numba for variance computation and is faster pr2612 S Dicks & P Angerer
scanpy.tl.leiden() now offers igraph’s implementation of the leiden algorithm via via flavor when set to igraph. leidenalg’s implementation is still default, but discouraged. pr2815 I Gold
scanpy.pp.highly_variable_genes() has new flavor seurat_v3_paper that is in its implementation consistent with the paper description in Stuart et al 2018. pr2792 E Roellin
scanpy.datasets.blobs() now accepts a random_state argument pr2683 E Roellin
scanpy.pp.pca() and scanpy.pp.regress_out() now accept a layer argument pr2588 S Dicks
scanpy.pp.subsample() with copy=True can now be called in backed mode pr2624 E Roellin
scanpy.external.pp.harmony_integrate() now runs with 64 bit floats improving reproducibility pr2655 S Dicks
scanpy.tl.rank_genes_groups() no longer warns that it’s default was changed from t-test_overestim_var to t-test pr2798 L Heumos
scanpy.pp.calculate_qc_metrics now allows qc_vars to be passed as a string pr2859 N Teyssier
scanpy.tl.leiden() and scanpy.tl.louvain() now store clustering parameters in the key provided by the key_added parameter instead of always writing to (or overwriting) a default key pr2864 J Fan
scanpy.pp.scale() now clips np.ndarray also at - max_value for zero-centering pr2913 S Dicks
Support sparse chunks in dask scale(), normalize_total() and highly_variable_genes() (seurat and cell-ranger tested) pr2856 ilan-gold

Documentation#

Doc style overhaul pr2220 A Gayoso
Re-add search-as-you-type, this time via readthedocs-sphinx-search pr2805 P Angerer
Fixed a lot of broken usage examples pr2605 P Angerer
Improved harmonization of return field of sc.pp and sc.tl functions pr2742 E Roellin
Improved docs for percent_top argument of calculate_qc_metrics() pr2849 I Virshup
New basic clustering tutorial (Preprocessing and clustering), based on one from scverse-tutorials pr2901 I Virshup
Overhauled Tutorials page, and added new How to section to docs pr2901 I Virshup
Added a new tutorial on working with dask (Using dask with Scanpy) pr2901 I Gold I Virshup

Bug fixes#

Updated read_visium() such that it can read spaceranger 2.0 files L Lehner
Fix normalize_total() for dask pr2466 P Angerer
Fix setting sc.settings.verbosity in some cases pr2605 P Angerer
Fix all remaining pandas warnings pr2789 P Angerer
Fix some annoying plotting warnings around violin plots pr2844 P Angerer
Scanpy now has a test job which tests against the minumum versions of the dependencies. In the process of implementing this, many bugs associated with using older versions of pandas, anndata, numpy, and matplotlib were fixed. pr2816 I Virshup
Fix warnings caused by internal usage of pandas.DataFrame.stack with pandas>=2.1 pr2864I Virshup
scanpy.get.aggregate() now always returns numpy.ndarray pr2893 S Dicks
Removes self from array of neighbors for use_approx_neighbors = True in scrublet() pr2896S Dicks
Compatibility with scipy 1.13 pr2943 I Virshup
Fix use of dendrogram() on highly correlated low precision data pr2928 P Angerer
Fix pytest deprecation warning pr2879 P Angerer

Development Process#

Scanpy is now tested against python 3.12 pr2863 ivirshup
Fix testing package build pr2468 P Angerer

Deprecations#

Dropped support for Python 3.8. More details here. pr2695 P Angerer
Deprecated specifying large numbers of function parameters by position as opposed to by name/keyword in all public APIs. e.g. prefer sc.tl.umap(adata, min_dist=0.1, spread=0.8) over sc.tl.umap(adata, 0.1, 0.8) pr2702 P Angerer
Dropped support for umap<0.5 for performance reasons. pr2870 P Angerer

Version 1.9#

1.9.8 2024-01-26#

Bug fixes#

Fix handling of numpy array palettes for old numpy versions pr2832 P Angerer

1.9.7 2024-01-25#

Bug fixes#

Fix handling of numpy array palettes (e.g. after write-read cycle) pr2734 P Angerer
Specify correct version of matplotlib dependency pr2733 P Fisher
Fix scanpy.pl.violin() usage of seaborn.catplot pr2739 E Roellin
Fix scanpy.pp.highly_variable_genes() to handle the combinations of inplace and subset consistently pr2757 E Roellin
Replace usage of various deprecated functionality from anndata and pandas pr2678 pr2779 P Angerer
Allow to use default n_top_genes when using scanpy.pp.highly_variable_genes() flavor 'seurat_v3' pr2782 P Angerer
Fix scanpy.read_10x_mtx()’s gex_only=True mode pr2801 P Angerer

1.9.6 2023-10-31#

Bug fixes#

Allow scanpy.pl.scatter() to accept a str palette name pr2571 P Angerer
Make scanpy.external.tl.palantir() compatible with palantir >=1.3 pr2672 DJ Otto
Fix scanpy.pl.pca() when return_fig=True and annotate_var_explained=True pr2682 J Wagner
Temp fix for issue2680 by skipping seaborn version 0.13.0 pr2661 P Angerer
Fix scanpy.pp.highly_variable_genes() to not modify the used layer when flavor=seurat pr2698 E Roellin
Prevent pandas from causing infinite recursion when setting a slice of a categorical column pr2719 P Angerer

1.9.5 2023-09-08#

Bug fixes#

Remove use of deprecated dtype argument to AnnData constructor pr2658 Isaac Virshup

1.9.4 2023-08-24#

Bug fixes#

Support scikit-learn 1.3 pr2515 P Angerer
Deal with None value vanishing from things like .uns['log1p'] pr2546 SP Shen
Depend on igraph instead of python-igraph pr2566 P Angerer
rank_genes_groups() now handles unsorted groups as intended pr2589 S Dicks
rank_genes_groups_df() now works for rank_genes_groups() with method="logreg" pr2601 S Dicks
scanpy.tl._utils._choose_representation now works with n_pcs if bigger than settings.N_PCS pr2610 S Dicks

1.9.3 2023-03-02#

Bug fixes#

Variety of fixes against pandas 2.0.0rc0 pr2434 I Virshup

1.9.2 2023-02-16#

Bug fixes#

highly_variable_genes() layer argument now works in tandem with batches pr2302 D Schaumont
highly_variable_genes() with flavor='cell_ranger' now handles the case in issue2230 where the number of calculated dispersions is less than n_top_genes pr2231 L Zappia
Fix compatibility with matplotlib 3.7 pr2414 I Virshup P Fisher
Fix scrublet numpy matrix compatibility issue pr2395 A Gayoso

1.9.1 2022-04-05#

Bug fixes#

normalize_total() works when Dask is not installed pr2209 R Cannoodt
Fix embedding plots by bumping matplotlib dependency to version 3.4 pr2212 I Virshup

1.9.0 2022-04-01#

Tutorials#

New tutorial on the usage of Pearson Residuals: How to preprocess UMI count data with analytic Pearson residuals J Lause, G Palla
Materials and recordings for Scanpy workshops by Maren Büttner

Experimental module#

Added scanpy.experimental module! Currently contains functionality related to pearson residuals in scanpy.experimental.pp pr1715 J Lause, G Palla, I Virshup. This includes:
- normalize_pearson_residuals() for Pearson Residuals normalization
- highly_variable_genes() for HVG selection with Pearson Residuals
- normalize_pearson_residuals_pca() for Pearson Residuals normalization and dimensionality reduction with PCA
- recipe_pearson_residuals() for Pearson Residuals normalization, HVG selection and dimensionality reduction with PCA

Features#

filter_rank_genes_groups() now allows to filter with absolute values of log fold change pr1649 S Rybakov
_choose_representation now subsets the provided representation to n_pcs, regardless of the name of the provided representation (should affect mostly neighbors()) pr2179 I Virshup PG Majev
scanpy.pp.scrublet() (and related functions) can now be used on AnnData objects containing multiple batches pr1965 J Manning
Number of variables plotted with pca_loadings() can now be controlled with n_points argument. Additionally, variables are no longer repeated if the anndata has less than 30 variables pr2075 Yves33
Dask arrays now work with scanpy.pp.normalize_total() pr1663 G Buckley, I Virshup
embedding_density() now allows more than 10 groups pr1936 A Wolf
Embedding plots can now pass colorbar_loc to specify the location of colorbar legend, or pass None to not show a colorbar pr1821 A Schaar I Virshup
Embedding plots now have a dimensions argument, which lets users select which dimensions of their embedding to plot and uses the same broadcasting rules as other arguments pr1538 I Virshup
print_versions() now uses session_info pr2089 P Angerer I Virshup

Ecosystem#

Multiple packages have been added to our ecosystem page, including:

decoupler a for footprint analysis and pathway enrichement pr2186 PB Mompel
dandelion for B-cell receptor analysis pr1953 Z Tuong
CIARA a feature selection tools for identifying rare cell types pr2175 M Stock

Bug fixes#

Fixed finding variables with use_raw=True and basis=None in scanpy.pl.scatter() pr2027 E Rice
Fixed scanpy.pp.scrublet() to address issue1957 FlMai and ensure raw counts are used for simulation
Functions in scanpy.datasets no longer throw OldFormatWarnings when using anndata 0.8 pr2096 I Virshup
Fixed use of scanpy.pp.neighbors() with method='rapids': RAPIDS cuML no longer returns a squared Euclidean distance matrix, so we should not square-root the kNN distance matrix. pr1828 M Zaslavsky
Removed pytables dependency by implementing read_10x_h5 with h5py due to installation errors on Windows pr2064
Fixed bug in scanpy.external.pp.hashsolo() where default value was set improperly pr2190 B Reiz
Fixed bug in scanpy.pl.embedding() functions where an error could be raised when there were missing values and large numbers of categories pr2187 I Virshup

Version 1.8#

1.8.2 2021-11-3#

Documentation#

Update conda installation instructions pr1974 L Heumos

Bug fixes#

Fix plotting after scanpy.tl.filter_rank_genes_groups() pr1942 S Rybakov
Fix use_raw=None using anndata.AnnData.var_names if anndata.AnnData.raw is present in scanpy.tl.score_genes() pr1999 M Klein
Fix compatibility with UMAP 0.5.2 pr2028 L Mcinnes
Fixed non-determinism in scanpy.pl.paga() node positions pr1922 I Virshup

Ecosystem#

Added PASTE (a tool to align and integrate spatial transcriptomics data) to scanpy ecosystem.

1.8.1 2021-07-07#

Bug fixes#

Fixed reproducibility of scanpy.tl.score_genes(). Calculation and output is now float64 type. pr1890 I Kucinski
Workarounds for some changes/ bugs in pandas 1.3 pr1918 I Virshup
Fixed bug where sc.pl.paga_compare could mislabel nodes on the paga graph pr1898 I Virshup
Fixed handling of use_raw with scanpy.tl.rank_genes_groups() pr1934 I Virshup

1.8.0 2021-06-28#

Metrics module#

Added scanpy.metrics module!
- Added scanpy.metrics.gearys_c() for spatial autocorrelation pr915 I Virshup
- Added scanpy.metrics.morans_i() for global spatial autocorrelation pr1740 I Virshup, G Palla
- Added scanpy.metrics.confusion_matrix() for comparing labellings pr915 I Virshup

Features#

Added layer and copy kwargs to normalize_total() pr1667 I Virshup
Added vcenter and norm arguments to the plotting functions pr1551 G Eraslan
Standardized and expanded available arguments to the sc.pl.rank_genes_groups* family of functions. pr1529 F Ramirez I Virshup
- See examples sections of rank_genes_groups_dotplot() and rank_genes_groups_matrixplot() for demonstrations.
scanpy.tl.tsne() now supports the metric argument and records the passed parameters pr1854 I Virshup
scanpy.pl.scrublet_score_distribution() now uses same API as other scanpy functions for saving/ showing plots pr1741 J Manning

Ecosystem#

Added Cubé to ecosystem page pr1878 C Lambden
Added triku a feature selection method to the ecosystem page pr1722 AM Ascensión
Added dorothea and progeny to the ecosystem page pr1767 P Badia-i-Mompel

Documentation#

Added Community page to docs pr1856 I Virshup
Added rendered examples to many plotting functions issue1664 A Schaar L Zappia bio-la L Hetzel L Dony M Buttner K Hrovatin F Ramirez I Virshup LouisK92 mayarali
Integrated DocSearch, a find-as-you-type documentation index search. pr1754 P Angerer
Reorganized reference docs pr1753 I Virshup
Clarified docs issues for neighbors(), diffmap(), calculate_qc_metrics() pr1680 G Palla
Fixed typos in grouped plot doc-strings pr1877 C Rands
Extended examples for differential expression plotting. pr1529 F Ramirez
- See rank_genes_groups_dotplot() or rank_genes_groups_matrixplot() for examples.

Bug fixes#

Fix scanpy.pl.paga_path() TypeError with recent versions of anndata pr1047 P Angerer
Fix detection of whether IPython is running pr1844 I Virshup
Fixed reproducibility of scanpy.tl.diffmap() (added random_state) pr1858 I Kucinski
Fixed errors and warnings from embedding plots with small numbers of categories after sns.set_palette was called pr1886 I Virshup
Fixed handling of gene_symbols argument in a number of sc.pl.rank_genes_groups* functions pr1529 F Ramirez I Virshup
Fixed handling of use_raw for sc.tl.rank_genes_groups when no .raw is present pr1895 I Virshup
scanpy.pl.rank_genes_groups_violin() now works for raw=False pr1669 M van den Beek
scanpy.pl.dotplot() now uses smallest_dot argument correctly pr1771 S Flemming

Development Process#

Switched to flit for building and deploying the package, a simple tool with an easy to understand command line interface and metadata pr1527 P Angerer
Use pre-commit for style checks pr1684 pr1848 L Heumos I Virshup

Deprecations#

Dropped support for Python 3.6. More details here. pr1897 I Virshup
Deprecated layers and layers_norm kwargs to normalize_total() pr1667 I Virshup
Deprecated MulticoreTSNE backend for scanpy.tl.tsne() pr1854 I Virshup

Version 1.7#

1.7.2 2021-04-07#

Bug fixes#

scanpy.logging.print_versions() now works when python<3.8 pr1691 I Virshup
scanpy.pp.regress_out() now uses joblib as the parallel backend, and should stop oversubscribing threads pr1694 I Virshup
scanpy.pp.highly_variable_genes() with flavor="seurat_v3" now returns correct gene means and -variances when used with batch_key pr1732 J Lause
scanpy.pp.highly_variable_genes() now throws a warning instead of an error when non-integer values are passed for method "seurat_v3". The check can be skipped by passing check_values=False. pr1679 G Palla

Ecosystem#

Added triku a feature selection method to the ecosystem page pr1722 AM Ascensión
Added dorothea and progeny to the ecosystem page pr1767 P Badia-i-Mompel

1.7.1 2021-02-24#

Documentation#

More twitter handles for core devs pr1676 G Eraslan

Bug fixes#

dendrogram() use 1 - correlation as distance matrix to compute the dendrogram pr1614 F Ramirez
Fixed obs_df()/ var_df() erroring when keys not passed pr1637 I Virshup
Fixed argument handling for scanpy.pp.scrublet() J Manning
Fixed passing of kwargs to scanpy.pl.violin() when stripplot was also used pr1655 M van den Beek
Fixed colorbar creation in scanpy.pl.timeseries_as_heatmap pr1654 M van den Beek

1.7.0 2021-02-03#

Features#

Add new 10x Visium datasets to visium_sge() pr1473 G Palla
Enable download of source image for 10x visium datasets in visium_sge() pr1506 H Spitzer
Refactor of scanpy.pl.spatial(). Better support for plotting without an image, as well as directly providing images pr1512 G Palla
Dict input for scanpy.queries.enrich() pr1488 G Eraslan
rank_genes_groups_df() can now return fraction of cells in a group expressing a gene, and allows retrieving values for multiple groups at once pr1388 G Eraslan
Color annotations for gene sets in heatmap() are now matched to color for cluster pr1511 L Sikkema
PCA plots can now annotate axes with variance explained pr1470 bfurtwa
Plots with groupby arguments can now group by values in the index by passing the index’s name (like pd.DataFrame.groupby). pr1583 F Ramirez
Added na_color and na_in_legend keyword arguments to embedding() plots. Allows specifying color for missing or filtered values in plots like umap() or spatial() pr1356 I Virshup
embedding() plots now support passing dict of {cluster_name: cluster_color, ...} for palette argument pr1392 I Virshup

External tools (new)#

Add Scanorama integration to scanpy external API (scanorama_integrate(), Hie et al. [2019]) pr1332 B Hie
Scrublet [Wolock et al., 2019] integration: scrublet(), scrublet_simulate_doublets(), and plotting method scrublet_score_distribution() pr1476 J Manning
hashsolo() for HTO demultiplexing [Bernstein et al., 2020] pr1432 NJ Bernstein
Added scirpy (sc-AIRR analysis) to ecosystem page pr1453 G Sturm
Added scvi-tools to ecosystem page pr1421 A Gayoso

External tools (changes)#

Updates for palantir() and palantir_results() pr1245 A Mousa
Fixes to harmony_timeseries() docs pr1248 A Mousa
Support for leiden clustering by scanpy.external.tl.phenograph() pr1080 A Mousa
Deprecate scanpy.external.pp.scvi pr1554 G Xing
Updated default params of sam() to work with larger data pr1540 A Tarashansky

Documentation#

New contribution guide pr1544 I Virshup
zsh installation instructions pr1444 P Angerer

Performance#

Speed up read_10x_h5() pr1402 P Weiler
Speed ups for obs_df() pr1499 F Ramirez

Bugfixes#

Consistent fold-change, fractions calculation for filter_rank_genes_groups pr1391 S Rybakov
Fixed bug where score_genes would error if one gene was passed pr1398 I Virshup
Fixed log1p inplace on integer dense arrays pr1400 I Virshup
Fix docstring formatting for rank_genes_groups() pr1417 P Weiler
Removed PendingDeprecationWarning`s from use of `np.matrix pr1424 P Weiler
Fixed indexing byg in ~scanpy.pp.highly_variable_genes pr1456 V Bergen
Fix default number of genes for marker_genes_overlap pr1464 MD Luecken
Fixed passing groupby and dendrogram_key to dendrogram() pr1465 M Varma
Fixed download path of pbmc3k_processed pr1472 D Strobl
Better error message when computing DE with a group of size 1 pr1490 J Manning
Update cugraph API usage for v0.16 pr1494 R Ilango
Fixed marker_gene_overlap default value for top_n_markers pr1464 MD Luecken
Pass random_state to RAPIDs UMAP pr1474 C Nolet
Fixed anndata version requirement for concat() (re-exported from scanpy as sc.concat) pr1491 I Virshup
Fixed the width of the progress bar when downloading data pr1507 M Klein
Updated link for moignard15 dataset pr1542 I Virshup
Fixed bug where calling set_figure_params could block if IPython was installed, but not used. pr1547 I Virshup
violin() no longer fails if .raw not present pr1548 I Virshup
spatial() refactoring and better handling of spatial data pr1512 G Palla
pca() works with chunked=True again pr1592 I Virshup
ingest() now works with umap-learn 0.5.0 pr1601 S Rybakov

Version 1.6#

1.6.0 2020-08-15#

This release includes an overhaul of dotplot(), matrixplot(), and stacked_violin() (pr1210 F Ramirez), and of the internals of rank_genes_groups() (pr1156 S Rybakov).

Overhaul of `dotplot()`, `matrixplot()`, and `stacked_violin()` pr1210 F Ramirez#

An overhauled tutorial Core plotting functions.
New plotting classes can be accessed directly (e.g., DotPlot) or using the return_fig param.
It is possible to plot log fold change and p-values in the rank_genes_groups_dotplot() family of functions.
Added ax parameter which allows embedding the plot in other images.
Added option to include a bar plot instead of the dendrogram containing the cell/observation totals per category.
Return a dictionary of axes for further manipulation. This includes the main plot, legend and dendrogram to totals
Legends can be removed.
The groupby param can take a list of categories, e.g., groupby=[‘tissue’, ‘cell type’].
Added padding parameter to dotplot and stacked_violin. pr1270
Added title for colorbar and positioned as in dotplot for matrixplot().
dotplot() changes:
- Improved the colorbar and size legend for dotplots. Now the colorbar and size have titles, which can be modified using the colorbar_title and size_title params. They also align at the bottom of the image and do not shrink if the dotplot image is smaller.
- Allow plotting genes in rows and categories in columns (swap_axes).
- Using DotPlot, the dot_edge_color and line width can be modified, a grid can be added, and other modifications are enabled.
- A new style was added in which the dots are replaced by an empty circle and the square behind the circle is colored (like in matrixplots).
stacked_violin() changes:
- Violin colors can be colored based on average gene expression as in dotplots.
- The linewidth of the violin plots is thinner.
- Removed the tics for the y-axis as they tend to overlap with each other. Using the style method they can be displayed if needed.

Additions#

concat() is now exported from scanpy, see Concatenation for more info. pr1338 I Virshup
Added highly variable gene selection strategy from Seurat v3 pr1204 A Gayoso
Added CellRank to scanpy ecosystem pr1304 giovp
Added backup_url param to read_10x_h5() pr1296 A Gayoso
Allow prefix for read_10x_mtx() pr1250 G Sturm
Optional tie correction for the 'wilcoxon' method in rank_genes_groups() pr1330 S Rybakov
Use sinfo for print_versions() and add print_header() to do what it previously did. pr1338 I Virshup pr1373

Bug fixes#

Avoid warning in rank_genes_groups() if ‘t-test’ is passed pr1303 A Wolf
Restrict sphinx version to <3.1, >3.0 pr1297 I Virshup
Clean up _ranks and fix dendrogram for scipy 1.5 pr1290 S Rybakov
Use .raw to translate gene symbols if applicable pr1278 E Rice
Fix diffmap (issue1262) G Eraslan
Fix neighbors in spring_project issue1260 S Rybakov
Fix default size of dot in spatial plots pr1255 issue1253 giovp
Bumped version requirement of scipy to scipy>1.4 to support rmatmat argument of LinearOperator issue1246 I Virshup
Fix asymmetry of scores for the 'wilcoxon' method in rank_genes_groups() issue754 S Rybakov
Avoid trimming of gene names in rank_genes_groups() issue753 S Rybakov

Version 1.5#

1.5.1 2020-05-21#

Bug fixes#

Fixed a bug in pca(), where random_state did not have an effect for sparse input pr1240 I Virshup
Fixed docstring in pca() which included an unused argument pr1240 I Virshup

1.5.0 2020-05-15#

The 1.5.0 release adds a lot of new functionality, much of which takes advantage of anndata updates 0.7.0 - 0.7.2. Highlights of this release include support for spatial data, dedicated handling of graphs in AnnData, sparse PCA, an interface with scvi, and others.

Spatial data support#

Basic analysis Analysis and visualization of spatial transcriptomics data and integration with single cell data Integrating spatial data with scRNA-seq using scanorama G Palla
read_visium() read 10x Visium data pr1034 G Palla, P Angerer, I Virshup
visium_sge() load Visium data directly from 10x Genomics pr1013 M Mirkazemi, G Palla, P Angerer
spatial() plot spatial data pr1012 G Palla, P Angerer

New functionality#

Many functions, like neighbors() and umap(), now store cell-by-cell graphs in obsp pr1118 S Rybakov
scale() and log1p() can be used on any element in layers or obsm pr1173 I Virshup

External tools#

scanpy.external.pp.scvi for preprocessing with scVI pr1085 G Xing
Guide for using Scanpy in R pr1186 L Zappia

Performance#

pca() now uses efficient implicit centering for sparse matrices. This can lead to signifigantly improved performance for large datasets pr1066 A Tarashansky
score_genes() now has an efficient implementation for sparse matrices with missing values pr1196 redst4r.

Warning

The new pca() implementation can result in slightly different results for sparse matrices. See the pr (pr1066) and documentation for more info.

Code design#

stacked_violin() can now be used as a subplot pr1084 P Angerer
score_genes() has improved logging pr1119 G Eraslan
scale() now saves mean and standard deviation in the var pr1173 A Wolf
harmony_timeseries() pr1091 A Mousa

Bug fixes#

combat() now works when obs_names aren’t unique. pr1215 I Virshup
scale() can now be used on dense arrays without centering pr1160 simonwm
regress_out() now works when some features are constant pr1194 simonwm
normalize_total() errored if the passed object was a view pr1200 I Virshup
neighbors() sometimes ignored the n_pcs param pr1124 V Bergen
ebi_expression_atlas() which contained some out-of-date URLs pr1102 I Virshup
ingest() for UMAP 0.4 pr1165 S Rybakov
louvain() for Louvain 0.6 pr1197 I Virshup
highly_variable_genes() which could lead to incorrect results when the batch_key argument was used pr1180 G Eraslan
ingest() where an inconsistent number of neighbors was used pr1111 S Rybakov

Version 1.4#

1.4.6 2020-03-17#

Functionality in `external`#

sam() self-assembling manifolds [Tarashansky et al., 2019] pr903 A Tarashansky
harmony_timeseries() for trajectory inference on discrete time points pr994 A Mousa
wishbone() for trajectory inference (bifurcations) pr1063 A Mousa

Code design#

violin now reads .uns['colors_...'] pr1029 michalk8

Bug fixes#

adapt ingest() for UMAP 0.4 pr1038 pr1106 S Rybakov
compat with matplotlib 3.1 and 3.2 pr1090 I Virshup, P Angerer
fix PAGA for new igraph pr1037 P Angerer
fix rapids compat of louvain pr1079 LouisFaure

1.4.5 2019-12-30#

Please install scanpy==1.4.5.post3 instead of scanpy==1.4.5.

New functionality#

ingest() maps labels and embeddings of reference data to new data Integrating data using ingest and BBKNN pr651 S Rybakov, A Wolf
queries recieved many updates including enrichment through gprofiler and more advanced biomart queries pr467 I Virshup
set_figure_params() allows setting figsize and accepts facecolor='white', useful for working in dark mode A Wolf

Code design#

downsample_counts now always preserves the dtype of it’s input, instead of converting floats to ints pr865 I Virshup
allow specifying a base for log1p() pr931 G Eraslan
run neighbors on a GPU using rapids pr830 T White
param docs from typed params P Angerer
embedding_density() now only takes one positional argument; similar for embedding_density(), which gains a param groupby pr965 A Wolf
webpage overhaul, ecosystem page, release notes, tutorials overhaul pr960 pr966 A Wolf

Warning

changed default solver in pca() from auto to arpack
changed default use_raw in score_genes() from False to None

1.4.4 2019-07-20#

New functionality#

scanpy.get adds helper functions for extracting data in convenient formats pr619 I Virshup

Bug fixes#

Stopped deprecations warnings from AnnData 0.6.22 I Virshup

Code design#

normalize_total() gains param exclude_highly_expressed, and fraction is renamed to max_fraction with better docs A Wolf

1.4.3 2019-05-14#

Bug fixes#

neighbors() correctly infers n_neighbors again from params, which was temporarily broken in v1.4.2 I Virshup

Code design#

calculate_qc_metrics() is single threaded by default for datasets under 300,000 cells – allowing cached compilation pr615 I Virshup

1.4.2 2019-05-06#

New functionality#

combat() supports additional covariates which may include adjustment variables or biological condition pr618 G Eraslan
highly_variable_genes() has a batch_key option which performs HVG selection in each batch separately to avoid selecting genes that vary strongly across batches pr622 G Eraslan

Bug fixes#

rank_genes_groups() t-test implementation doesn’t return NaN when variance is 0, also changed to scipy’s implementation pr621 I Virshup
umap() with init_pos='paga' detects correct dtype A Wolf
louvain() and leiden() auto-generate key_added=louvain_R upon passing restrict_to, which was temporarily changed in 1.4.1 A Wolf

Code design#

neighbors() and umap() got rid of UMAP legacy code and introduced UMAP as a dependency pr576 S Rybakov

1.4.1 2019-04-26#

New functionality#

Scanpy has a command line interface again. Invoking it with scanpy somecommand [args] calls scanpy-somecommand [args], except for builtin commands (currently scanpy settings) pr604 P Angerer
ebi_expression_atlas() allows convenient download of EBI expression atlas I Virshup
marker_gene_overlap() computes overlaps of marker genes M Luecken
filter_rank_genes_groups() filters out genes based on fold change and fraction of cells expressing genes F Ramirez
normalize_total() replaces normalize_per_cell(), is more efficient and provides a parameter to only normalize using a fraction of expressed genes S Rybakov
downsample_counts() has been sped up, changed default value of replace parameter to False pr474 I Virshup
embedding_density() computes densities on embeddings pr543 M Luecken
palantir() interfaces Palantir [Setty et al., 2019] pr493 A Mousa

Code design#

.layers support of scatter plots F Ramirez
fix double-logarithmization in compute of log fold change in rank_genes_groups() A Muñoz-Rojas
fix return sections of docs P Angerer

Version 1.3#

1.3.8 2019-02-05#

various documentation and dev process improvements
Added combat() function for batch effect correction [Johnson et al., 2006, Leek et al., 2017, Pedersen, 2012] pr398 M Lange

1.3.7 2019-01-02#

API changed from import scanpy as sc to import scanpy.api as sc.
phenograph() wraps the graph clustering package Phenograph [Levine et al., 2015] thanks to A Mousa

1.3.6 2018-12-11#

Major updates#

a new plotting gallery for visualizing-marker-genes F Ramirez
tutorials are integrated on ReadTheDocs, pbmc3k and paga-paul15 A Wolf

Interactive exploration of analysis results through manifold viewers#

CZI’s cellxgene directly reads .h5ad files the cellxgene developers
the UCSC Single Cell Browser requires exporting via cellbrowser() M Haeussler

Code design#

highly_variable_genes() supersedes filter_genes_dispersion(), it gives the same results but, by default, expects logarithmized data and doesn’t subset A Wolf

1.3.5 2018-12-09#

uncountable figure improvements pr369 F Ramirez

1.3.4 2018-11-24#

leiden() wraps the recent graph clustering package by Traag et al. [2019] K Polanski
bbknn() wraps the recent batch correction package [Polański et al., 2019] K Polanski
calculate_qc_metrics() caculates a number of quality control metrics, similar to calculateQCMetrics from Scater [McCarthy et al., 2017] I Virshup

1.3.3 2018-11-05#

Major updates#

a fully distributed preprocessing backend T White and the Laserson Lab

Code design#

read_10x_h5() and read_10x_mtx() read Cell Ranger 3.0 outputs pr334 Q Gong

Note

Also see changes in anndata 0.6.

changed default compression to None in write_h5ad() to speed up read and write, disk space use is usually less critical
performance gains in write_h5ad() due to better handling of strings and categories S Rybakov

1.3.1 2018-09-03#

RNA velocity in single cells [La Manno et al., 2018]#

Scanpy and AnnData support loom’s layers so that computations for single-cell RNA velocity [La Manno et al., 2018] become feasible S Rybakov and V Bergen
scvelo harmonizes with Scanpy and is able to process loom files with splicing information produced by Velocyto [La Manno et al., 2018], it runs a lot faster than the count matrix analysis of Velocyto and provides several conceptual developments

Plotting (Generic)#

dotplot() for visualizing genes across conditions and clusters, see here pr199 F Ramirez
heatmap() for pretty heatmaps pr175 F Ramirez
violin() produces very compact overview figures with many panels pr175 F Ramirez

There now is a section on imputation in external:#

magic() for imputation using data diffusion [van Dijk et al., 2018] pr187 S Gigante
dca() for imputation and latent space construction using an autoencoder [Eraslan et al., 2019] pr186 G Eraslan

Version 1.2#

1.2.1 2018-06-08#

Plotting of Generic marker genes and quality control.#

highest_expr_genes() for quality control; plot genes with highest mean fraction of cells, similar to plotQC of Scater [McCarthy et al., 2017] pr169 F Ramirez

1.2.0 2018-06-08#

paga() improved, see PAGA; the default model changed, restore the previous default model by passing model='v1.0'

Version 1.1#

1.1.0 2018-06-01#

set_figure_params() by default passes vector_friendly=True and allows you to produce reasonablly sized pdfs by rasterizing large scatter plots A Wolf
draw_graph() defaults to the ForceAtlas2 layout [Chippada, 2018, Jacomy et al., 2014], which is often more visually appealing and whose computation is much faster S Wollock
scatter() also plots along variables axis MD Luecken
pca() and log1p() support chunk processing S Rybakov
regress_out() is back to multiprocessing F Ramirez
read() reads compressed text files G Eraslan
mitochondrial_genes() for querying mito genes FG Brundu
mnn_correct() for batch correction [Haghverdi et al., 2018, Kang, 2018]
phate() for low-dimensional embedding [Moon et al., 2019] S Gigante
sandbag(), cyclone() for scoring genes [Fechtner, 2018, Scialdone et al., 2015]

Version 1.0#

1.0.0 2018-03-30#

Major updates#

Scanpy is much faster and more memory efficient: preprocess, cluster and visualize 1.3M cells in 6h, 130K cells in 14min, and 68K cells in 3min A Wolf
the API gained a preprocessing function neighbors() and a class Neighbors() to which all basic graph computations are delegated A Wolf

Warning

Upgrading to 1.0 isn’t fully backwards compatible in the following changes

the graph-based tools louvain() dpt() draw_graph() umap() diffmap() paga() require prior computation of the graph: sc.pp.neighbors(adata, n_neighbors=5); sc.tl.louvain(adata) instead of previously sc.tl.louvain(adata, n_neighbors=5)
install numba via conda install numba, which replaces cython
the default connectivity measure (dpt will look different using default settings) changed. setting method='gauss' in sc.pp.neighbors uses gauss kernel connectivities and reproduces the previous behavior, see, for instance in the example paul15.
namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore
replace occurances of group_by with groupby (consistency with pandas)
it is worth checking out the notebook examples to see changes, e.g. the seurat example.
upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different

Further updates#

UMAP [McInnes et al., 2018] can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is also used for measuring connectivities and computing neighbors, see neighbors() A Wolf
graph abstraction: AGA is renamed to PAGA: paga(); now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately via louvain() and dpt(), the connectivity measure has been improved A Wolf
logistic regression for finding marker genes rank_genes_groups() with parameter method='logreg' A Wolf
louvain() provides a better implementation for reclustering via restrict_to A Wolf
scanpy no longer modifies rcParams upon import, call settings.set_figure_params to set the ‘scanpy style’ A Wolf
default cache directory is ./cache/, set settings.cachedir to change this; nested directories in this are avoided A Wolf
show edges in scatter plots based on graph visualization draw_graph() and umap() by passing edges=True A Wolf
downsample_counts() for downsampling counts MD Luecken
default 'louvain_groups' are called 'louvain' A Wolf
'X_diffmap' contains the zero component, plotting remains unchanged A Wolf

Version 0.4#

0.4.4 2018-02-26#

embed cells using umap() [McInnes et al., 2018] pr92 G Eraslan
score sets of genes, e.g. for cell cycle, using score_genes() [Satija et al., 2015]: notebook

0.4.3 2018-02-09#

clustermap(): heatmap from hierarchical clustering, based on seaborn.clustermap() [Waskom et al., 2016] A Wolf
only return matplotlib.axes.Axes in plotting functions of sc.pl when show=False, otherwise None A Wolf

0.4.2 2018-01-07#

amendments in PAGA and its plotting functions A Wolf

0.4.0 2017-12-23#

export to SPRING [Weinreb et al., 2017] for interactive visualization of data: spring tutorial S Wollock

Version 0.3#

0.3.2 2017-11-29#

finding marker genes via rank_genes_groups_violin() improved, see issue51 F Ramirez

0.3.0 2017-11-16#

AnnData gains method concatenate() A Wolf
AnnData is available as the separate anndata package P Angerer, A Wolf
results of PAGA simplified A Wolf

Version 0.2#

0.2.9 2017-10-25#

Initial release of the new trajectory inference method PAGA #

paga() computes an abstracted, coarse-grained (PAGA) graph of the neighborhood graph A Wolf
paga_compare() plot this graph next an embedding A Wolf
paga_path() plots a heatmap through a node sequence in the PAGA graph A Wolf

0.2.1 2017-07-24#

Scanpy includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells. A Wolf, P Angerer

Version 0.1#

0.1.0 2017-05-17#

Scanpy computationally outperforms and allows reproducing both the Cell Ranger R kit’s and most of Seurat’s clustering workflows. A Wolf, P Angerer

Release notes

Contents

Release notes#

Version 1.11#

1.11.0.dev11+g0cfd0224 2024-12-16#

Bug fixes#

Documentation#

Performance#

Version 1.10#

1.10.4 2024-11-12#

Breaking changes#

Bug fixes#

1.10.3 2024-09-17#

Bug fixes#

1.10.2 2024-06-25#

Development Process#

Documentation#

Bug fixes#

Performance#

1.10.1 2024-04-09#

Documentation#

Bug fixes#

Performance#

1.10.0 2024-03-26#

Features#

Documentation#

Bug fixes#

Development Process#

Deprecations#

Version 1.9#

1.9.8 2024-01-26#

Bug fixes#

1.9.7 2024-01-25#

Bug fixes#

1.9.6 2023-10-31#

Bug fixes#

1.9.5 2023-09-08#

Bug fixes#

1.9.4 2023-08-24#

Bug fixes#

1.9.3 2023-03-02#

Bug fixes#

1.9.2 2023-02-16#

Bug fixes#

1.9.1 2022-04-05#

Bug fixes#

1.9.0 2022-04-01#

Tutorials#

Experimental module#

Features#

Ecosystem#

Bug fixes#

Version 1.8#

1.8.2 2021-11-3#

Documentation#

Bug fixes#

Ecosystem#

1.8.1 2021-07-07#

Bug fixes#

1.8.0 2021-06-28#

Metrics module#

Features#

Ecosystem#

Documentation#

Bug fixes#

Development Process#

Deprecations#

Version 1.7#

1.7.2 2021-04-07#

Bug fixes#

Ecosystem#

1.7.1 2021-02-24#

Documentation#

Bug fixes#

1.7.0 2021-02-03#

Features#

External tools (new)#

External tools (changes)#

Documentation#

Performance#

Overhaul of `dotplot()`, `matrixplot()`, and `stacked_violin()` pr1210 F Ramirez#

Functionality in `external`#