Release notes

Contents

Release notes#

Version 1.11#

1.11.0.dev11+g0cfd0224 2024-12-16#

Bug fixes#

Documentation#

Performance#

Version 1.10#

1.10.4 2024-11-12#

Breaking changes#

  • Remove Python 3.9 support P Angerer (pr3283)

Bug fixes#

1.10.3 2024-09-17#

Bug fixes#

1.10.2 2024-06-25#

Development Process#

  • Add performance benchmarking pr2977 R Shrestha, P Angerer

Documentation#

  • Document several missing parameters in docstring pr2888 S Cheney

  • Fixed incorrect instructions in “testing” dev docs pr2994 I Virshup

  • Update marsilea tutorial to use group_ methods pr3001 I Virshup

  • Fixed citations pr3032 P Angerer

  • Improve dataset documentation pr3060 P Angerer

Bug fixes#

  • Compatibility with matplotlib 3.9 pr2999 I Virshup

  • Add clear errors where backed mode-like matrices (i.e., from sparse_dataset) are not supported pr3048 I gold

  • Write out full pca results when _choose_representation is called i.e., neighbors() without pca() pr3079 I gold

  • Fix deprecated use of .A with sparse matrices pr3084 P Angerer

  • Fix zappy support pr3089 P Angerer

  • Fix dotplot group order with pandas 1.x pr3101 P Angerer

Performance#

  • sparse_mean_variance_axis now uses all cores for the calculations pr3015 S Dicks

  • pp.highly_variable_genes with flavor=seurat_v3 now uses a numba kernel pr3017 S Dicks

  • Speed up scrublet() pr3044 S Dicks and pr3056 P Angerer

  • Speed up clipping of array in scale() pr3100 P Ashish & S Dicks

1.10.1 2024-04-09#

Documentation#

Bug fixes#

  • Fix aggregate when aggregating by more than two groups pr2965 I Virshup

Performance#

  • scale() now uses numba kernels for sparse.csr_matrix and sparse.csc_matrix when zero_center==False and mask_obs is provided. This greatly speed up execution pr2942 S Dicks

1.10.0 2024-03-26#

scanpy 1.10 brings a large amount of new features, performance improvements, and improved documentation.

Some highlights:

  • Improved support for out-of-core workflows via dask. See new tutorial: Using dask with Scanpy demonstrating counts-to-clusters for 1.4 million cells in <10 min.

  • A new basic clustering tutorial demonstrating an updated workflow.

  • Opt-in increased performance for neighbor search and clustering (how to guide).

  • Ability to mask observations or variables from a number of methods (see Customizing Scanpy plots for an example with plotting embeddings)

  • A new function aggregate() for computing aggregations of your data, very useful for pseudo bulking!

Features#

Documentation#

Bug fixes#

  • Updated read_visium() such that it can read spaceranger 2.0 files L Lehner

  • Fix normalize_total() for dask pr2466 P Angerer

  • Fix setting sc.settings.verbosity in some cases pr2605 P Angerer

  • Fix all remaining pandas warnings pr2789 P Angerer

  • Fix some annoying plotting warnings around violin plots pr2844 P Angerer

  • Scanpy now has a test job which tests against the minumum versions of the dependencies. In the process of implementing this, many bugs associated with using older versions of pandas, anndata, numpy, and matplotlib were fixed. pr2816 I Virshup

  • Fix warnings caused by internal usage of pandas.DataFrame.stack with pandas>=2.1 pr2864I Virshup

  • scanpy.get.aggregate() now always returns numpy.ndarray pr2893 S Dicks

  • Removes self from array of neighbors for use_approx_neighbors = True in scrublet() pr2896S Dicks

  • Compatibility with scipy 1.13 pr2943 I Virshup

  • Fix use of dendrogram() on highly correlated low precision data pr2928 P Angerer

  • Fix pytest deprecation warning pr2879 P Angerer

Development Process#

  • Scanpy is now tested against python 3.12 pr2863 ivirshup

  • Fix testing package build pr2468 P Angerer

Deprecations#

  • Dropped support for Python 3.8. More details here. pr2695 P Angerer

  • Deprecated specifying large numbers of function parameters by position as opposed to by name/keyword in all public APIs. e.g. prefer sc.tl.umap(adata, min_dist=0.1, spread=0.8) over sc.tl.umap(adata, 0.1, 0.8) pr2702 P Angerer

  • Dropped support for umap<0.5 for performance reasons. pr2870 P Angerer

Version 1.9#

1.9.8 2024-01-26#

Bug fixes#

  • Fix handling of numpy array palettes for old numpy versions pr2832 P Angerer

1.9.7 2024-01-25#

Bug fixes#

1.9.6 2023-10-31#

Bug fixes#

1.9.5 2023-09-08#

Bug fixes#

  • Remove use of deprecated dtype argument to AnnData constructor pr2658 Isaac Virshup

1.9.4 2023-08-24#

Bug fixes#

1.9.3 2023-03-02#

Bug fixes#

  • Variety of fixes against pandas 2.0.0rc0 pr2434 I Virshup

1.9.2 2023-02-16#

Bug fixes#

1.9.1 2022-04-05#

Bug fixes#

  • normalize_total() works when Dask is not installed pr2209 R Cannoodt

  • Fix embedding plots by bumping matplotlib dependency to version 3.4 pr2212 I Virshup

1.9.0 2022-04-01#

Tutorials#

Experimental module#

Features#

  • filter_rank_genes_groups() now allows to filter with absolute values of log fold change pr1649 S Rybakov

  • _choose_representation now subsets the provided representation to n_pcs, regardless of the name of the provided representation (should affect mostly neighbors()) pr2179 I Virshup PG Majev

  • scanpy.pp.scrublet() (and related functions) can now be used on AnnData objects containing multiple batches pr1965 J Manning

  • Number of variables plotted with pca_loadings() can now be controlled with n_points argument. Additionally, variables are no longer repeated if the anndata has less than 30 variables pr2075 Yves33

  • Dask arrays now work with scanpy.pp.normalize_total() pr1663 G Buckley, I Virshup

  • embedding_density() now allows more than 10 groups pr1936 A Wolf

  • Embedding plots can now pass colorbar_loc to specify the location of colorbar legend, or pass None to not show a colorbar pr1821 A Schaar I Virshup

  • Embedding plots now have a dimensions argument, which lets users select which dimensions of their embedding to plot and uses the same broadcasting rules as other arguments pr1538 I Virshup

  • print_versions() now uses session_info pr2089 P Angerer I Virshup

Ecosystem#

Multiple packages have been added to our ecosystem page, including:

  • decoupler a for footprint analysis and pathway enrichement pr2186 PB Mompel

  • dandelion for B-cell receptor analysis pr1953 Z Tuong

  • CIARA a feature selection tools for identifying rare cell types pr2175 M Stock

Bug fixes#

  • Fixed finding variables with use_raw=True and basis=None in scanpy.pl.scatter() pr2027 E Rice

  • Fixed scanpy.pp.scrublet() to address issue1957 FlMai and ensure raw counts are used for simulation

  • Functions in scanpy.datasets no longer throw OldFormatWarnings when using anndata 0.8 pr2096 I Virshup

  • Fixed use of scanpy.pp.neighbors() with method='rapids': RAPIDS cuML no longer returns a squared Euclidean distance matrix, so we should not square-root the kNN distance matrix. pr1828 M Zaslavsky

  • Removed pytables dependency by implementing read_10x_h5 with h5py due to installation errors on Windows pr2064

  • Fixed bug in scanpy.external.pp.hashsolo() where default value was set improperly pr2190 B Reiz

  • Fixed bug in scanpy.pl.embedding() functions where an error could be raised when there were missing values and large numbers of categories pr2187 I Virshup

Version 1.8#

1.8.2 2021-11-3#

Documentation#

  • Update conda installation instructions pr1974 L Heumos

Bug fixes#

Ecosystem#

  • Added PASTE (a tool to align and integrate spatial transcriptomics data) to scanpy ecosystem.

1.8.1 2021-07-07#

Bug fixes#

1.8.0 2021-06-28#

Metrics module#

Features#

Ecosystem#

  • Added Cubé to ecosystem page pr1878 C Lambden

  • Added triku a feature selection method to the ecosystem page pr1722 AM Ascensión

  • Added dorothea and progeny to the ecosystem page pr1767 P Badia-i-Mompel

Documentation#

Bug fixes#

  • Fix scanpy.pl.paga_path() TypeError with recent versions of anndata pr1047 P Angerer

  • Fix detection of whether IPython is running pr1844 I Virshup

  • Fixed reproducibility of scanpy.tl.diffmap() (added random_state) pr1858 I Kucinski

  • Fixed errors and warnings from embedding plots with small numbers of categories after sns.set_palette was called pr1886 I Virshup

  • Fixed handling of gene_symbols argument in a number of sc.pl.rank_genes_groups* functions pr1529 F Ramirez I Virshup

  • Fixed handling of use_raw for sc.tl.rank_genes_groups when no .raw is present pr1895 I Virshup

  • scanpy.pl.rank_genes_groups_violin() now works for raw=False pr1669 M van den Beek

  • scanpy.pl.dotplot() now uses smallest_dot argument correctly pr1771 S Flemming

Development Process#

  • Switched to flit for building and deploying the package, a simple tool with an easy to understand command line interface and metadata pr1527 P Angerer

  • Use pre-commit for style checks pr1684 pr1848 L Heumos I Virshup

Deprecations#

Version 1.7#

1.7.2 2021-04-07#

Bug fixes#

Ecosystem#

  • Added triku a feature selection method to the ecosystem page pr1722 AM Ascensión

  • Added dorothea and progeny to the ecosystem page pr1767 P Badia-i-Mompel

1.7.1 2021-02-24#

Documentation#

  • More twitter handles for core devs pr1676 G Eraslan

Bug fixes#

1.7.0 2021-02-03#

Features#

  • Add new 10x Visium datasets to visium_sge() pr1473 G Palla

  • Enable download of source image for 10x visium datasets in visium_sge() pr1506 H Spitzer

  • Refactor of scanpy.pl.spatial(). Better support for plotting without an image, as well as directly providing images pr1512 G Palla

  • Dict input for scanpy.queries.enrich() pr1488 G Eraslan

  • rank_genes_groups_df() can now return fraction of cells in a group expressing a gene, and allows retrieving values for multiple groups at once pr1388 G Eraslan

  • Color annotations for gene sets in heatmap() are now matched to color for cluster pr1511 L Sikkema

  • PCA plots can now annotate axes with variance explained pr1470 bfurtwa

  • Plots with groupby arguments can now group by values in the index by passing the index’s name (like pd.DataFrame.groupby). pr1583 F Ramirez

  • Added na_color and na_in_legend keyword arguments to embedding() plots. Allows specifying color for missing or filtered values in plots like umap() or spatial() pr1356 I Virshup

  • embedding() plots now support passing dict of {cluster_name: cluster_color, ...} for palette argument pr1392 I Virshup

External tools (new)#

External tools (changes)#

Documentation#

Performance#

Bugfixes#

  • Consistent fold-change, fractions calculation for filter_rank_genes_groups pr1391 S Rybakov

  • Fixed bug where score_genes would error if one gene was passed pr1398 I Virshup

  • Fixed log1p inplace on integer dense arrays pr1400 I Virshup

  • Fix docstring formatting for rank_genes_groups() pr1417 P Weiler

  • Removed PendingDeprecationWarning`s from use of `np.matrix pr1424 P Weiler

  • Fixed indexing byg in ~scanpy.pp.highly_variable_genes pr1456 V Bergen

  • Fix default number of genes for marker_genes_overlap pr1464 MD Luecken

  • Fixed passing groupby and dendrogram_key to dendrogram() pr1465 M Varma

  • Fixed download path of pbmc3k_processed pr1472 D Strobl

  • Better error message when computing DE with a group of size 1 pr1490 J Manning

  • Update cugraph API usage for v0.16 pr1494 R Ilango

  • Fixed marker_gene_overlap default value for top_n_markers pr1464 MD Luecken

  • Pass random_state to RAPIDs UMAP pr1474 C Nolet

  • Fixed anndata version requirement for concat() (re-exported from scanpy as sc.concat) pr1491 I Virshup

  • Fixed the width of the progress bar when downloading data pr1507 M Klein

  • Updated link for moignard15 dataset pr1542 I Virshup

  • Fixed bug where calling set_figure_params could block if IPython was installed, but not used. pr1547 I Virshup

  • violin() no longer fails if .raw not present pr1548 I Virshup

  • spatial() refactoring and better handling of spatial data pr1512 G Palla

  • pca() works with chunked=True again pr1592 I Virshup

  • ingest() now works with umap-learn 0.5.0 pr1601 S Rybakov

Version 1.6#

1.6.0 2020-08-15#

This release includes an overhaul of dotplot(), matrixplot(), and stacked_violin() (pr1210 F Ramirez), and of the internals of rank_genes_groups() (pr1156 S Rybakov).

Overhaul of dotplot(), matrixplot(), and stacked_violin() pr1210 F Ramirez#

  • An overhauled tutorial Core plotting functions.

  • New plotting classes can be accessed directly (e.g., DotPlot) or using the return_fig param.

  • It is possible to plot log fold change and p-values in the rank_genes_groups_dotplot() family of functions.

  • Added ax parameter which allows embedding the plot in other images.

  • Added option to include a bar plot instead of the dendrogram containing the cell/observation totals per category.

  • Return a dictionary of axes for further manipulation. This includes the main plot, legend and dendrogram to totals

  • Legends can be removed.

  • The groupby param can take a list of categories, e.g., groupby=[‘tissue’, ‘cell type’].

  • Added padding parameter to dotplot and stacked_violin. pr1270

  • Added title for colorbar and positioned as in dotplot for matrixplot().

  • dotplot() changes:

    • Improved the colorbar and size legend for dotplots. Now the colorbar and size have titles, which can be modified using the colorbar_title and size_title params. They also align at the bottom of the image and do not shrink if the dotplot image is smaller.

    • Allow plotting genes in rows and categories in columns (swap_axes).

    • Using DotPlot, the dot_edge_color and line width can be modified, a grid can be added, and other modifications are enabled.

    • A new style was added in which the dots are replaced by an empty circle and the square behind the circle is colored (like in matrixplots).

  • stacked_violin() changes:

    • Violin colors can be colored based on average gene expression as in dotplots.

    • The linewidth of the violin plots is thinner.

    • Removed the tics for the y-axis as they tend to overlap with each other. Using the style method they can be displayed if needed.

Additions#

Bug fixes#

Version 1.5#

1.5.1 2020-05-21#

Bug fixes#

  • Fixed a bug in pca(), where random_state did not have an effect for sparse input pr1240 I Virshup

  • Fixed docstring in pca() which included an unused argument pr1240 I Virshup

1.5.0 2020-05-15#

The 1.5.0 release adds a lot of new functionality, much of which takes advantage of anndata updates 0.7.0 - 0.7.2. Highlights of this release include support for spatial data, dedicated handling of graphs in AnnData, sparse PCA, an interface with scvi, and others.

Spatial data support#

New functionality#

External tools#

  • scanpy.external.pp.scvi for preprocessing with scVI pr1085 G Xing

  • Guide for using Scanpy in R pr1186 L Zappia

Performance#

  • pca() now uses efficient implicit centering for sparse matrices. This can lead to signifigantly improved performance for large datasets pr1066 A Tarashansky

  • score_genes() now has an efficient implementation for sparse matrices with missing values pr1196 redst4r.

Warning

The new pca() implementation can result in slightly different results for sparse matrices. See the pr (pr1066) and documentation for more info.

Code design#

Bug fixes#

Version 1.4#

1.4.6 2020-03-17#

Functionality in external#

Code design#

Bug fixes#

  • adapt ingest() for UMAP 0.4 pr1038 pr1106 S Rybakov

  • compat with matplotlib 3.1 and 3.2 pr1090 I Virshup, P Angerer

  • fix PAGA for new igraph pr1037 P Angerer

  • fix rapids compat of louvain pr1079 LouisFaure

1.4.5 2019-12-30#

Please install scanpy==1.4.5.post3 instead of scanpy==1.4.5.

New functionality#

Code design#

  • downsample_counts now always preserves the dtype of it’s input, instead of converting floats to ints pr865 I Virshup

  • allow specifying a base for log1p() pr931 G Eraslan

  • run neighbors on a GPU using rapids pr830 T White

  • param docs from typed params P Angerer

  • embedding_density() now only takes one positional argument; similar for embedding_density(), which gains a param groupby pr965 A Wolf

  • webpage overhaul, ecosystem page, release notes, tutorials overhaul pr960 pr966 A Wolf

Warning

  • changed default solver in pca() from auto to arpack

  • changed default use_raw in score_genes() from False to None

1.4.4 2019-07-20#

New functionality#

  • scanpy.get adds helper functions for extracting data in convenient formats pr619 I Virshup

Bug fixes#

  • Stopped deprecations warnings from AnnData 0.6.22 I Virshup

Code design#

  • normalize_total() gains param exclude_highly_expressed, and fraction is renamed to max_fraction with better docs A Wolf

1.4.3 2019-05-14#

Bug fixes#

  • neighbors() correctly infers n_neighbors again from params, which was temporarily broken in v1.4.2 I Virshup

Code design#

  • calculate_qc_metrics() is single threaded by default for datasets under 300,000 cells – allowing cached compilation pr615 I Virshup

1.4.2 2019-05-06#

New functionality#

  • combat() supports additional covariates which may include adjustment variables or biological condition pr618 G Eraslan

  • highly_variable_genes() has a batch_key option which performs HVG selection in each batch separately to avoid selecting genes that vary strongly across batches pr622 G Eraslan

Bug fixes#

  • rank_genes_groups() t-test implementation doesn’t return NaN when variance is 0, also changed to scipy’s implementation pr621 I Virshup

  • umap() with init_pos='paga' detects correct dtype A Wolf

  • louvain() and leiden() auto-generate key_added=louvain_R upon passing restrict_to, which was temporarily changed in 1.4.1 A Wolf

Code design#

1.4.1 2019-04-26#

New functionality#

Code design#

  • .layers support of scatter plots F Ramirez

  • fix double-logarithmization in compute of log fold change in rank_genes_groups() A Muñoz-Rojas

  • fix return sections of docs P Angerer

Version 1.3#

1.3.8 2019-02-05#

1.3.7 2019-01-02#

  • API changed from import scanpy as sc to import scanpy.api as sc.

  • phenograph() wraps the graph clustering package Phenograph [Levine et al., 2015] thanks to A Mousa

1.3.6 2018-12-11#

Major updates#

  • a new plotting gallery for visualizing-marker-genes F Ramirez

  • tutorials are integrated on ReadTheDocs, pbmc3k and paga-paul15 A Wolf

Interactive exploration of analysis results through manifold viewers#

Code design#

1.3.5 2018-12-09#

  • uncountable figure improvements pr369 F Ramirez

1.3.4 2018-11-24#

1.3.3 2018-11-05#

Major updates#

  • a fully distributed preprocessing backend T White and the Laserson Lab

Code design#

Note

Also see changes in anndata 0.6.

  • changed default compression to None in write_h5ad() to speed up read and write, disk space use is usually less critical

  • performance gains in write_h5ad() due to better handling of strings and categories S Rybakov

1.3.1 2018-09-03#

RNA velocity in single cells [La Manno et al., 2018]#

  • Scanpy and AnnData support loom’s layers so that computations for single-cell RNA velocity [La Manno et al., 2018] become feasible S Rybakov and V Bergen

  • scvelo harmonizes with Scanpy and is able to process loom files with splicing information produced by Velocyto [La Manno et al., 2018], it runs a lot faster than the count matrix analysis of Velocyto and provides several conceptual developments

Plotting (Generic)#

There now is a section on imputation in external:#

Version 1.2#

1.2.1 2018-06-08#

Plotting of Generic marker genes and quality control.#

1.2.0 2018-06-08#

  • paga() improved, see PAGA; the default model changed, restore the previous default model by passing model='v1.0'

Version 1.1#

1.1.0 2018-06-01#

Version 1.0#

1.0.0 2018-03-30#

Major updates#

  • Scanpy is much faster and more memory efficient: preprocess, cluster and visualize 1.3M cells in 6h, 130K cells in 14min, and 68K cells in 3min A Wolf

  • the API gained a preprocessing function neighbors() and a class Neighbors() to which all basic graph computations are delegated A Wolf

Warning

Upgrading to 1.0 isn’t fully backwards compatible in the following changes

  • the graph-based tools louvain() dpt() draw_graph() umap() diffmap() paga() require prior computation of the graph: sc.pp.neighbors(adata, n_neighbors=5); sc.tl.louvain(adata) instead of previously sc.tl.louvain(adata, n_neighbors=5)

  • install numba via conda install numba, which replaces cython

  • the default connectivity measure (dpt will look different using default settings) changed. setting method='gauss' in sc.pp.neighbors uses gauss kernel connectivities and reproduces the previous behavior, see, for instance in the example paul15.

  • namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore

  • replace occurances of group_by with groupby (consistency with pandas)

  • it is worth checking out the notebook examples to see changes, e.g. the seurat example.

  • upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different

Further updates#

  • UMAP [McInnes et al., 2018] can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is also used for measuring connectivities and computing neighbors, see neighbors() A Wolf

  • graph abstraction: AGA is renamed to PAGA: paga(); now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately via louvain() and dpt(), the connectivity measure has been improved A Wolf

  • logistic regression for finding marker genes rank_genes_groups() with parameter method='logreg' A Wolf

  • louvain() provides a better implementation for reclustering via restrict_to A Wolf

  • scanpy no longer modifies rcParams upon import, call settings.set_figure_params to set the ‘scanpy style’ A Wolf

  • default cache directory is ./cache/, set settings.cachedir to change this; nested directories in this are avoided A Wolf

  • show edges in scatter plots based on graph visualization draw_graph() and umap() by passing edges=True A Wolf

  • downsample_counts() for downsampling counts MD Luecken

  • default 'louvain_groups' are called 'louvain' A Wolf

  • 'X_diffmap' contains the zero component, plotting remains unchanged A Wolf

Version 0.4#

0.4.4 2018-02-26#

0.4.3 2018-02-09#

0.4.2 2018-01-07#

  • amendments in PAGA and its plotting functions A Wolf

0.4.0 2017-12-23#

Version 0.3#

0.3.2 2017-11-29#

0.3.0 2017-11-16#

Version 0.2#

0.2.9 2017-10-25#

Initial release of the new trajectory inference method PAGA#

  • paga() computes an abstracted, coarse-grained (PAGA) graph of the neighborhood graph A Wolf

  • paga_compare() plot this graph next an embedding A Wolf

  • paga_path() plots a heatmap through a node sequence in the PAGA graph A Wolf

0.2.1 2017-07-24#

Scanpy includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells. A Wolf, P Angerer

Version 0.1#

0.1.0 2017-05-17#

Scanpy computationally outperforms and allows reproducing both the Cell Ranger R kit’s and most of Seurat’s clustering workflows. A Wolf, P Angerer