Release notes

Version 1.9

1.9.8 2024-01-26

Bug fixes

Fix handling of numpy array palettes for old numpy versions PR 2832 P Angerer

1.9.7 2024-01-25

Bug fixes

Fix handling of numpy array palettes (e.g. after write-read cycle) PR 2734 P Angerer
Specify correct version of matplotlib dependency PR 2733 P Fisher
Fix scanpy.pl.violin() usage of seaborn.catplot PR 2739 E Roellin
Fix scanpy.pp.highly_variable_genes() to handle the combinations of inplace and subset consistently PR 2757 E Roellin
Replace usage of various deprecated functionality from anndata and pandas PR 2678 PR 2779 P Angerer
Allow to use default n_top_genes when using scanpy.pp.highly_variable_genes() flavor 'seurat_v3' PR 2782 P Angerer
Fix scanpy.read_10x_mtx()’s gex_only=True mode PR 2801 P Angerer

1.9.6 2023-10-31

Bug fixes

Allow scanpy.pl.scatter() to accept a str palette name PR 2571 P Angerer
Make scanpy.external.tl.palantir() compatible with palantir >=1.3 PR 2672 DJ Otto
Fix scanpy.pl.pca() when return_fig=True and annotate_var_explained=True PR 2682 J Wagner
Temp fix for issue 2680 by skipping seaborn version 0.13.0 PR 2661 P Angerer
Fix scanpy.pp.highly_variable_genes() to not modify the used layer when flavor=seurat PR 2698 E Roellin
Prevent pandas from causing infinite recursion when setting a slice of a categorical column PR 2719 P Angerer

1.9.5 2023-09-08

Bug fixes

Remove use of deprecated dtype argument to AnnData constructor PR 2658 Isaac Virshup

1.9.4 2023-08-24

Bug fixes

Support scikit-learn 1.3 PR 2515 P Angerer
Deal with None value vanishing from things like .uns['log1p'] PR 2546 SP Shen
Depend on igraph instead of python-igraph PR 2566 P Angerer
rank_genes_groups() now handles unsorted groups as intended PR 2589 S Dicks
rank_genes_groups_df() now works for rank_genes_groups() with method="logreg" PR 2601 S Dicks
_choose_representation() now works with n_pcs if bigger than settings.N_PCS PR 2610 S Dicks

1.9.3 2023-03-02

Bug fixes

Variety of fixes against pandas 2.0.0rc0 PR 2434 I Virshup
Compatibility with anndata 0.9.0rc PR 2435 I Virshup

1.9.2 2023-02-16

Bug fixes

highly_variable_genes() layer argument now works in tandem with batches PR 2302 D Schaumont
highly_variable_genes() with flavor='cell_ranger' now handles the case in issue 2230 where the number of calculated dispersions is less than n_top_genes PR 2231 L Zappia
Fix compatibility with matplotlib 3.7 PR 2414 I Virshup P Fisher
Fix scrublet numpy matrix compatibility issue PR 2395 A Gayoso

1.9.1 2022-04-05

Bug fixes

normalize_total() works when Dask is not installed PR 2209 R Cannoodt
Fix embedding plots by bumping matplotlib dependency to version 3.4 PR 2212 I Virshup

1.9.0 2022-04-01

Tutorials

New tutorial on the usage of Pearson Residuals: → tutorial: tutorial_pearson_residuals J Lause, G Palla
Materials and recordings for Scanpy workshops by Maren Büttner

Experimental module

Added scanpy.experimental module! Currently contains functionality related to pearson residuals in scanpy.experimental.pp PR 1715 J Lause, G Palla, I Virshup. This includes:
- normalize_pearson_residuals() for Pearson Residuals normalization
- highly_variable_genes() for HVG selection with Pearson Residuals
- normalize_pearson_residuals_pca() for Pearson Residuals normalization and dimensionality reduction with PCA
- recipe_pearson_residuals() for Pearson Residuals normalization, HVG selection and dimensionality reduction with PCA

Features

filter_rank_genes_groups() now allows to filter with absolute values of log fold change PR 1649 S Rybakov
_choose_representation now subsets the provided representation to n_pcs, regardless of the name of the provided representation (should affect mostly neighbors()) PR 2179 I Virshup PG Majev
scanpy.external.pp.scrublet() (and related functions) can now be used on AnnData objects containing multiple batches PR 1965 J Manning
Number of variables plotted with pca_loadings() can now be controlled with n_points argument. Additionally, variables are no longer repeated if the anndata has less than 30 variables PR 2075 Yves33
Dask arrays now work with scanpy.pp.normalize_total() PR 1663 G Buckley, I Virshup
embedding_density() now allows more than 10 groups PR 1936 A Wolf
Embedding plots can now pass colorbar_loc to specify the location of colorbar legend, or pass None to not show a colorbar PR 1821 A Schaar I Virshup
Embedding plots now have a dimensions argument, which lets users select which dimensions of their embedding to plot and uses the same broadcasting rules as other arguments PR 1538 I Virshup
print_versions() now uses session_info PR 2089 P Angerer I Virshup

Ecosystem

Multiple packages have been added to our ecosystem page, including:

decoupler a for footprint analysis and pathway enrichement PR 2186 PB Mompel
dandelion for B-cell receptor analysis PR 1953 Z Tuong
CIARA a feature selection tools for identifying rare cell types PR 2175 M Stock

Bug fixes

Fixed finding variables with use_raw=True and basis=None in scanpy.pl.scatter() PR 2027 E Rice
Fixed scanpy.external.pp.scrublet() to address issue 1957 FlMai and ensure raw counts are used for simulation
Functions in scanpy.datasets no longer throw OldFormatWarnings when using anndata 0.8 PR 2096 I Virshup
Fixed use of scanpy.pp.neighbors() with method='rapids': RAPIDS cuML no longer returns a squared Euclidean distance matrix, so we should not square-root the kNN distance matrix. PR 1828 M Zaslavsky
Removed pytables dependency by implementing read_10x_h5 with h5py due to installation errors on Windows PR 2064
Fixed bug in scanpy.external.pp.hashsolo() where default value was set improperly PR 2190 B Reiz
Fixed bug in scanpy.pl.embedding() functions where an error could be raised when there were missing values and large numbers of categories PR 2187 I Virshup

Version 1.8

1.8.2 2021-11-3

Docs

Update conda installation instructions PR 1974 L Heumos

Bug fixes

Fix plotting after scanpy.tl.filter_rank_genes_groups() PR 1942 S Rybakov
Fix use_raw=None using anndata.AnnData.var_names if anndata.AnnData.raw is present in scanpy.tl.score_genes() PR 1999 M Klein
Fix compatibility with UMAP 0.5.2 PR 2028 L Mcinnes
Fixed non-determinism in scanpy.pl.paga() node positions PR 1922 I Virshup

Ecosystem

Added PASTE (a tool to align and integrate spatial transcriptomics data) to scanpy ecosystem.

1.8.1 2021-07-07

Bug fixes

Fixed reproducibility of scanpy.tl.score_genes(). Calculation and output is now float64 type. PR 1890 I Kucinski
Workarounds for some changes/ bugs in pandas 1.3 PR 1918 I Virshup
Fixed bug where sc.pl.paga_compare could mislabel nodes on the paga graph PR 1898 I Virshup
Fixed handling of use_raw with scanpy.tl.rank_genes_groups() PR 1934 I Virshup

1.8.0 2021-06-28

Metrics module

Added scanpy.metrics module!
- Added scanpy.metrics.gearys_c() for spatial autocorrelation PR 915 I Virshup
- Added scanpy.metrics.morans_i() for global spatial autocorrelation PR 1740 I Virshup, G Palla
- Added scanpy.metrics.confusion_matrix() for comparing labellings PR 915 I Virshup

Features

Added layer and copy kwargs to normalize_total() PR 1667 I Virshup
Added vcenter and norm arguments to the plotting functions PR 1551 G Eraslan
Standardized and expanded available arguments to the sc.pl.rank_genes_groups* family of functions. PR 1529 F Ramirez I Virshup - See examples sections of rank_genes_groups_dotplot() and rank_genes_groups_matrixplot() for demonstrations.
scanpy.tl.tsne() now supports the metric argument and records the passed parameters PR 1854 I Virshup
scanpy.external.pl.scrublet_score_distribution() now uses same API as other scanpy functions for saving/ showing plots PR 1741 J Manning

Ecosystem

Added Cubé to ecosystem page PR 1878 C Lambden
Added triku a feature selection method to the ecosystem page PR 1722 AM Ascensión
Added dorothea and progeny to the ecosystem page PR 1767 P Badia-i-Mompel

Documentation

Added Community page to docs PR 1856 I Virshup
Added rendered examples to many plotting functions issue 1664 A Schaar L Zappia bio-la L Hetzel L Dony M Buttner K Hrovatin F Ramirez I Virshup LouisK92 mayarali
Integrated DocSearch, a find-as-you-type documentation index search. PR 1754 P Angerer
- Reorganized reference docs PR 1753 I Virshup
Clarified docs issues for neighbors(), diffmap(), calculate_qc_metrics() PR 1680 G Palla
Fixed typos in grouped plot doc-strings PR 1877 C Rands
Extended examples for differential expression plotting. PR 1529 F Ramirez - See rank_genes_groups_dotplot() or rank_genes_groups_matrixplot() for examples.

Bug fixes

Fix scanpy.pl.paga_path() TypeError with recent versions of anndata PR 1047 P Angerer
Fix detection of whether IPython is running PR 1844 I Virshup
Fixed reproducibility of scanpy.tl.diffmap() (added random_state) PR 1858 I Kucinski
Fixed errors and warnings from embedding plots with small numbers of categories after sns.set_palette was called PR 1886 I Virshup
Fixed handling of gene_symbols argument in a number of sc.pl.rank_genes_groups* functions PR 1529 F Ramirez I Virshup
Fixed handling of use_raw for sc.tl.rank_genes_groups when no .raw is present PR 1895 I Virshup
scanpy.pl.rank_genes_groups_violin() now works for raw=False PR 1669 M van den Beek
scanpy.pl.dotplot() now uses smallest_dot argument correctly PR 1771 S Flemming

Development processes

Switched to flit for building and deploying the package, a simple tool with an easy to understand command line interface and metadata PR 1527 P Angerer
Use pre-commit for style checks PR 1684 PR 1848 L Heumos I Virshup

Deprecations

Dropped support for Python 3.6. More details here. PR 1897 I Virshup
Deprecated layers and layers_norm kwargs to normalize_total() PR 1667 I Virshup
Deprecated MulticoreTSNE backend for scanpy.tl.tsne() PR 1854 I Virshup

Version 1.7

1.7.2 2021-04-07

Bug fixes

scanpy.logging.print_versions() now works when python<3.8 PR 1691 I Virshup
scanpy.pp.regress_out() now uses joblib as the parallel backend, and should stop oversubscribing threads PR 1694 I Virshup
scanpy.pp.highly_variable_genes() with flavor="seurat_v3" now returns correct gene means and -variances when used with batch_key PR 1732 J Lause
scanpy.pp.highly_variable_genes() now throws a warning instead of an error when non-integer values are passed for method "seurat_v3". The check can be skipped by passing check_values=False. PR 1679 G Palla

Ecosystem

Added triku a feature selection method to the ecosystem page PR 1722 AM Ascensión
Added dorothea and progeny to the ecosystem page PR 1767 P Badia-i-Mompel

1.7.1 2021-02-24

Documentation

More twitter handles for core devs PR 1676 G Eraslan

Bug fixes

dendrogram() use 1 - correlation as distance matrix to compute the dendrogram PR 1614 F Ramirez
Fixed obs_df()/ var_df() erroring when keys not passed PR 1637 I Virshup
Fixed argument handling for scanpy.external.pp.scrublet() J Manning
Fixed passing of kwargs to scanpy.pl.violin() when stripplot was also used PR 1655 M van den Beek
Fixed colorbar creation in scanpy.pl.timeseries_as_heatmap PR 1654 M van den Beek

1.7.0 2021-02-03

Features

Add new 10x Visium datasets to visium_sge() PR 1473 G Palla
Enable download of source image for 10x visium datasets in visium_sge() PR 1506 H Spitzer
Refactor of scanpy.pl.spatial(). Better support for plotting without an image, as well as directly providing images PR 1512 G Palla
Dict input for scanpy.queries.enrich() PR 1488 G Eraslan
rank_genes_groups_df() can now return fraction of cells in a group expressing a gene, and allows retrieving values for multiple groups at once PR 1388 G Eraslan
Color annotations for gene sets in heatmap() are now matched to color for cluster PR 1511 L Sikkema
PCA plots can now annotate axes with variance explained PR 1470 bfurtwa
Plots with groupby arguments can now group by values in the index by passing the index’s name (like pd.DataFrame.groupby). PR 1583 F Ramirez
Added na_color and na_in_legend keyword arguments to embedding() plots. Allows specifying color for missing or filtered values in plots like umap() or spatial() PR 1356 I Virshup
embedding() plots now support passing dict of {cluster_name: cluster_color, ...} for palette argument PR 1392 I Virshup

External tools (new)

Add Scanorama integration to scanpy external API (scanorama_integrate()) [^cite_hie19] PR 1332 B Hie
Scrublet [^cite_wolock19] integration: scrublet(), scrublet_simulate_doublets(), and plotting method scrublet_score_distribution() PR 1476 J Manning
hashsolo() for HTO demultiplexing [^cite_bernstein20] PR 1432 NJ Bernstein
Added scirpy (sc-AIRR analysis) to ecosystem page PR 1453 G Sturm
Added scvi-tools to ecosystem page PR 1421 A Gayoso

External tools (changes)

Updates for palantir() and palantir_results() PR 1245 A Mousa
Fixes to harmony_timeseries() docs PR 1248 A Mousa
Support for leiden clustering by scanpy.external.tl.phenograph() PR 1080 A Mousa
Deprecate scanpy.external.pp.scvi PR 1554 G Xing
Updated default params of sam() to work with larger data PR 1540 A Tarashansky

Documentation

New contribution guide PR 1544 I Virshup
zsh installation instructions PR 1444 P Angerer

Performance

Speed up read_10x_h5() PR 1402 P Weiler
Speed ups for obs_df() PR 1499 F Ramirez

Bugfixes

Consistent fold-change, fractions calculation for filter_rank_genes_groups PR 1391 S Rybakov
Fixed bug where score_genes would error if one gene was passed PR 1398 I Virshup
Fixed log1p inplace on integer dense arrays PR 1400 I Virshup
Fix docstring formatting for rank_genes_groups() PR 1417 P Weiler
Removed PendingDeprecationWarning`s from use of `np.matrix PR 1424 P Weiler
Fixed indexing byg in ~scanpy.pp.highly_variable_genes PR 1456 V Bergen
Fix default number of genes for marker_genes_overlap PR 1464 MD Luecken
Fixed passing groupby and dendrogram_key to dendrogram() PR 1465 M Varma
Fixed download path of pbmc3k_processed PR 1472 D Strobl
Better error message when computing DE with a group of size 1 PR 1490 J Manning
Update cugraph API usage for v0.16 PR 1494 R Ilango
Fixed marker_gene_overlap default value for top_n_markers PR 1464 MD Luecken
Pass random_state to RAPIDs UMAP PR 1474 C Nolet
Fixed anndata version requirement for concat() (re-exported from scanpy as sc.concat) PR 1491 I Virshup
Fixed the width of the progress bar when downloading data PR 1507 M Klein
Updated link for moignard15 dataset PR 1542 I Virshup
Fixed bug where calling set_figure_params could block if IPython was installed, but not used. PR 1547 I Virshup
violin() no longer fails if .raw not present PR 1548 I Virshup
spatial() refactoring and better handling of spatial data PR 1512 G Palla
pca() works with chunked=True again PR 1592 I Virshup
ingest() now works with umap-learn 0.5.0 PR 1601 S Rybakov

Version 1.6

1.6.0 2020-08-15

This release includes an overhaul of dotplot(), matrixplot(), and stacked_violin() (PR 1210 F Ramirez), and of the internals of rank_genes_groups() (PR 1156 S Rybakov).

Overhaul of dotplot(), matrixplot(), and stacked_violin() PR 1210 F Ramirez

An overhauled tutorial → tutorial: plotting/core.
New plotting classes can be accessed directly (e.g., DotPlot) or using the return_fig param.
It is possible to plot log fold change and p-values in the rank_genes_groups_dotplot() family of functions.
Added ax parameter which allows embedding the plot in other images.
Added option to include a bar plot instead of the dendrogram containing the cell/observation totals per category.
Return a dictionary of axes for further manipulation. This includes the main plot, legend and dendrogram to totals
Legends can be removed.
The groupby param can take a list of categories, e.g., groupby=[‘tissue’, ‘cell type’].
Added padding parameter to dotplot and stacked_violin. PR 1270
Added title for colorbar and positioned as in dotplot for matrixplot().
dotplot() changes:
- Improved the colorbar and size legend for dotplots. Now the colorbar and size have titles, which can be modified using the colorbar_title and size_title params. They also align at the bottom of the image and do not shrink if the dotplot image is smaller.
- Allow plotting genes in rows and categories in columns (swap_axes).
- Using DotPlot, the dot_edge_color and line width can be modified, a grid can be added, and other modifications are enabled.
- A new style was added in which the dots are replaced by an empty circle and the square behind the circle is colored (like in matrixplots).
stacked_violin() changes:
- Violin colors can be colored based on average gene expression as in dotplots.
- The linewidth of the violin plots is thinner.
- Removed the tics for the y-axis as they tend to overlap with each other. Using the style method they can be displayed if needed.

Additions

concat() is now exported from scanpy, see Concatenation for more info. PR 1338 I Virshup
Added highly variable gene selection strategy from Seurat v3 PR 1204 A Gayoso
Added CellRank to scanpy ecosystem PR 1304 giovp
Added backup_url param to read_10x_h5() PR 1296 A Gayoso
Allow prefix for read_10x_mtx() PR 1250 G Sturm
Optional tie correction for the 'wilcoxon' method in rank_genes_groups() PR 1330 S Rybakov
Use sinfo for print_versions() and add print_header() to do what it previously did. PR 1338 I Virshup PR 1373

Bug fixes

Avoid warning in rank_genes_groups() if ‘t-test’ is passed PR 1303 A Wolf
Restrict sphinx version to <3.1, >3.0 PR 1297 I Virshup
Clean up _ranks and fix dendrogram for scipy 1.5 PR 1290 S Rybakov
Use .raw to translate gene symbols if applicable PR 1278 E Rice
Fix diffmap (issue 1262) G Eraslan
Fix neighbors in spring_project issue 1260 S Rybakov
Fix default size of dot in spatial plots PR 1255 issue 1253 giovp
Bumped version requirement of scipy to scipy>1.4 to support rmatmat argument of LinearOperator issue 1246 I Virshup
Fix asymmetry of scores for the 'wilcoxon' method in rank_genes_groups() issue 754 S Rybakov
Avoid trimming of gene names in rank_genes_groups() issue 753 S Rybakov

Version 1.5

1.5.1 2020-05-21

Bug fixes

Fixed a bug in pca(), where random_state did not have an effect for sparse input PR 1240 I Virshup
Fixed docstring in pca() which included an unused argument PR 1240 I Virshup

1.5.0 2020-05-15

The 1.5.0 release adds a lot of new functionality, much of which takes advantage of anndata updates 0.7.0 - 0.7.2. Highlights of this release include support for spatial data, dedicated handling of graphs in AnnData, sparse PCA, an interface with scvi, and others.

Spatial data support

Basic analysis → tutorial: spatial/basic-analysis and integration with single cell data → tutorial: spatial/integration-scanorama G Palla
read_visium() read 10x Visium data PR 1034 G Palla, P Angerer, I Virshup
visium_sge() load Visium data directly from 10x Genomics PR 1013 M Mirkazemi, G Palla, P Angerer
spatial() plot spatial data PR 1012 G Palla, P Angerer

New functionality

Many functions, like neighbors() and umap(), now store cell-by-cell graphs in obsp PR 1118 S Rybakov
scale() and log1p() can be used on any element in layers or obsm PR 1173 I Virshup

External tools

scanpy.external.pp.scvi for preprocessing with scVI PR 1085 G Xing
Guide for using Scanpy in R PR 1186 L Zappia

Performance

pca() now uses efficient implicit centering for sparse matrices. This can lead to signifigantly improved performance for large datasets PR 1066 A Tarashansky
score_genes() now has an efficient implementation for sparse matrices with missing values PR 1196 redst4r.

Warning

The new pca() implementation can result in slightly different results for sparse matrices. See the pr (PR 1066) and documentation for more info.

Code design

stacked_violin() can now be used as a subplot PR 1084 P Angerer
score_genes() has improved logging PR 1119 G Eraslan
scale() now saves mean and standard deviation in the var PR 1173 A Wolf
harmony_timeseries() PR 1091 A Mousa

Bug fixes

combat() now works when obs_names aren’t unique. PR 1215 I Virshup
scale() can now be used on dense arrays without centering PR 1160 simonwm
regress_out() now works when some features are constant PR 1194 simonwm
normalize_total() errored if the passed object was a view PR 1200 I Virshup
neighbors() sometimes ignored the n_pcs param PR 1124 V Bergen
ebi_expression_atlas() which contained some out-of-date URLs PR 1102 I Virshup
ingest() for UMAP 0.4 PR 1165 S Rybakov
louvain() for Louvain 0.6 PR 1197 I Virshup
highly_variable_genes() which could lead to incorrect results when the batch_key argument was used PR 1180 G Eraslan
ingest() where an inconsistent number of neighbors was used PR 1111 S Rybakov

Version 1.4

1.4.6 2020-03-17

Functionality in external

sam() self-assembling manifolds [^cite_tarashansky19] PR 903 A Tarashansky
harmony_timeseries() for trajectory inference on discrete time points PR 994 A Mousa
wishbone() for trajectory inference (bifurcations) PR 1063 A Mousa

Code design

violin now reads .uns['colors_...'] PR 1029 michalk8

Bug fixes

adapt ingest() for UMAP 0.4 PR 1038 PR 1106 S Rybakov
compat with matplotlib 3.1 and 3.2 PR 1090 I Virshup, P Angerer
fix PAGA for new igraph PR 1037 P Angerer
fix rapids compat of louvain PR 1079 LouisFaure

1.4.5 2019-12-30

Please install scanpy==1.4.5.post3 instead of scanpy==1.4.5.

New functionality

ingest() maps labels and embeddings of reference data to new data → tutorial: integrating-data-using-ingest PR 651 S Rybakov, A Wolf
queries recieved many updates including enrichment through gprofiler and more advanced biomart queries PR 467 I Virshup
set_figure_params() allows setting figsize and accepts facecolor='white', useful for working in dark mode A Wolf

Code design

downsample_counts now always preserves the dtype of it’s input, instead of converting floats to ints PR 865 I Virshup
allow specifying a base for log1p() PR 931 G Eraslan
run neighbors on a GPU using rapids PR 830 T White
param docs from typed params P Angerer
embedding_density() now only takes one positional argument; similar for embedding_density(), which gains a param groupby PR 965 A Wolf
webpage overhaul, ecosystem page, release notes, tutorials overhaul PR 960 PR 966 A Wolf

Warning

changed default solver in pca() from auto to arpack
changed default use_raw in score_genes() from False to None

1.4.4 2019-07-20

New functionality

scanpy.get adds helper functions for extracting data in convenient formats PR 619 I Virshup

Bug fixes

Stopped deprecations warnings from AnnData 0.6.22 I Virshup

Code design

normalize_total() gains param exclude_highly_expressed, and fraction is renamed to max_fraction with better docs A Wolf

1.4.3 2019-05-14

Bug fixes

neighbors() correctly infers n_neighbors again from params, which was temporarily broken in v1.4.2 I Virshup

Code design

calculate_qc_metrics() is single threaded by default for datasets under 300,000 cells – allowing cached compilation PR 615 I Virshup

1.4.2 2019-05-06

New functionality

combat() supports additional covariates which may include adjustment variables or biological condition PR 618 G Eraslan
highly_variable_genes() has a batch_key option which performs HVG selection in each batch separately to avoid selecting genes that vary strongly across batches PR 622 G Eraslan

Bug fixes

rank_genes_groups() t-test implementation doesn’t return NaN when variance is 0, also changed to scipy’s implementation PR 621 I Virshup
umap() with init_pos='paga' detects correct dtype A Wolf
louvain() and leiden() auto-generate key_added=louvain_R upon passing restrict_to, which was temporarily changed in 1.4.1 A Wolf

Code design

neighbors() and umap() got rid of UMAP legacy code and introduced UMAP as a dependency PR 576 S Rybakov

1.4.1 2019-04-26

New functionality

Scanpy has a command line interface again. Invoking it with scanpy somecommand [args] calls scanpy-somecommand [args], except for builtin commands (currently scanpy settings) PR 604 P Angerer
ebi_expression_atlas() allows convenient download of EBI expression atlas I Virshup
marker_gene_overlap() computes overlaps of marker genes M Luecken
filter_rank_genes_groups() filters out genes based on fold change and fraction of cells expressing genes F Ramirez
normalize_total() replaces normalize_per_cell(), is more efficient and provides a parameter to only normalize using a fraction of expressed genes S Rybakov
downsample_counts() has been sped up, changed default value of replace parameter to False PR 474 I Virshup
embedding_density() computes densities on embeddings PR 543 M Luecken
palantir() interfaces Palantir [^cite_setty18] PR 493 A Mousa

Code design

.layers support of scatter plots F Ramirez
fix double-logarithmization in compute of log fold change in rank_genes_groups() A Muñoz-Rojas
fix return sections of docs P Angerer

Version 1.3

1.3.6 2018-12-11

Major updates

a new plotting gallery for visualizing-marker-genes F Ramirez
tutorials are integrated on ReadTheDocs, pbmc3k and paga-paul15 A Wolf

Interactive exploration of analysis results through manifold viewers

CZI’s cellxgene directly reads .h5ad files the cellxgene developers
the UCSC Single Cell Browser requires exporting via cellbrowser() M Haeussler

Code design

highly_variable_genes() supersedes filter_genes_dispersion(), it gives the same results but, by default, expects logarithmized data and doesn’t subset A Wolf

1.3.5 2018-12-09

uncountable figure improvements PR 369 F Ramirez

1.3.4 2018-11-24

leiden() wraps the recent graph clustering package by [^cite_traag18] K Polanski
bbknn() wraps the recent batch correction package [^cite_polanski19] K Polanski
calculate_qc_metrics() caculates a number of quality control metrics, similar to calculateQCMetrics from Scater [^cite_mccarthy17] I Virshup

1.3.3 2018-11-05

Major updates

a fully distributed preprocessing backend T White and the Laserson Lab

Code design

read_10x_h5() and read_10x_mtx() read Cell Ranger 3.0 outputs PR 334 Q Gong

Note

Also see changes in anndata 0.6.

changed default compression to None in write_h5ad() to speed up read and write, disk space use is usually less critical
performance gains in write_h5ad() due to better handling of strings and categories S Rybakov

1.3.1 2018-09-03

RNA velocity in single cells [^cite_manno18]

Scanpy and AnnData support loom’s layers so that computations for single-cell RNA velocity [^cite_manno18] become feasible S Rybakov and V Bergen
scvelo harmonizes with Scanpy and is able to process loom files with splicing information produced by Velocyto [^cite_manno18], it runs a lot faster than the count matrix analysis of Velocyto and provides several conceptual developments

Plotting (Generic)

dotplot() for visualizing genes across conditions and clusters, see here PR 199 F Ramirez
heatmap() for pretty heatmaps PR 175 F Ramirez
violin() produces very compact overview figures with many panels PR 175 F Ramirez

There now is a section on imputation in external:

magic() for imputation using data diffusion [^cite_vandijk18] PR 187 S Gigante
dca() for imputation and latent space construction using an autoencoder [^cite_eraslan18] PR 186 G Eraslan

Version 1.2

1.2.1 2018-06-08

Plotting of Generic marker genes and quality control.

highest_expr_genes() for quality control; plot genes with highest mean fraction of cells, similar to plotQC of Scater [^cite_mccarthy17] PR 169 F Ramirez

1.2.0 2018-06-08

paga() improved, see PAGA; the default model changed, restore the previous default model by passing model='v1.0'

Version 1.1

1.1.0 2018-06-01

set_figure_params() by default passes vector_friendly=True and allows you to produce reasonablly sized pdfs by rasterizing large scatter plots A Wolf
draw_graph() defaults to the ForceAtlas2 layout [^cite_jacomy14] [^cite_chippada18], which is often more visually appealing and whose computation is much faster S Wollock
scatter() also plots along variables axis MD Luecken
pca() and log1p() support chunk processing S Rybakov
regress_out() is back to multiprocessing F Ramirez
read() reads compressed text files G Eraslan
mitochondrial_genes() for querying mito genes FG Brundu
mnn_correct() for batch correction [^cite_haghverdi18] [^cite_kang18]
phate() for low-dimensional embedding [^cite_moon17] S Gigante
sandbag(), cyclone() for scoring genes [^cite_scialdone15] [^cite_fechtner18]

Version 1.0

1.0.0 2018-03-30

Major updates

Scanpy is much faster and more memory efficient: preprocess, cluster and visualize 1.3M cells in 6h, 130K cells in 14min, and 68K cells in 3min A Wolf
the API gained a preprocessing function neighbors() and a class Neighbors() to which all basic graph computations are delegated A Wolf

Warning

Upgrading to 1.0 isn’t fully backwards compatible in the following changes

the graph-based tools louvain() dpt() draw_graph() umap() diffmap() paga() require prior computation of the graph: sc.pp.neighbors(adata, n_neighbors=5); sc.tl.louvain(adata) instead of previously sc.tl.louvain(adata, n_neighbors=5)
install numba via conda install numba, which replaces cython
the default connectivity measure (dpt will look different using default settings) changed. setting method='gauss' in sc.pp.neighbors uses gauss kernel connectivities and reproduces the previous behavior, see, for instance in the example paul15.
namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore
replace occurances of group_by with groupby (consistency with pandas)
it is worth checking out the notebook examples to see changes, e.g. the seurat example.
upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different


```{rubric} Further updates

UMAP [^cite_mcinnes18] can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is also used for measuring connectivities and computing neighbors, see neighbors() A Wolf
graph abstraction: AGA is renamed to PAGA: paga(); now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately via louvain() and dpt(), the connectivity measure has been improved A Wolf
logistic regression for finding marker genes rank_genes_groups() with parameter method='logreg' A Wolf
louvain() provides a better implementation for reclustering via restrict_to A Wolf
scanpy no longer modifies rcParams upon import, call settings.set_figure_params to set the ‘scanpy style’ A Wolf
default cache directory is ./cache/, set settings.cachedir to change this; nested directories in this are avoided A Wolf
show edges in scatter plots based on graph visualization draw_graph() and umap() by passing edges=True A Wolf
downsample_counts() for downsampling counts MD Luecken
default 'louvain_groups' are called 'louvain' A Wolf
'X_diffmap' contains the zero component, plotting remains unchanged A Wolf

Version 0.4

0.4.3 2018-02-09

clustermap(): heatmap from hierarchical clustering, based on seaborn.clustermap() [^cite_waskom16] A Wolf
only return matplotlib.axes.Axes in plotting functions of sc.pl when show=False, otherwise None A Wolf

0.4.2 2018-01-07

amendments in PAGA and its plotting functions A Wolf

0.4.0 2017-12-23

export to SPRING [^cite_weinreb17] for interactive visualization of data: spring tutorial S Wollock

Version 0.3

0.3.2 2017-11-29

finding marker genes via rank_genes_groups_violin() improved, see issue 51 F Ramirez

0.3.0 2017-11-16

AnnData gains method concatenate() A Wolf
AnnData is available as the separate anndata package P Angerer, A Wolf
results of PAGA simplified A Wolf

Version 0.2

0.2.9 2017-10-25

Initial release of the new trajectory inference method PAGA

paga() computes an abstracted, coarse-grained (PAGA) graph of the neighborhood graph A Wolf
paga_compare() plot this graph next an embedding A Wolf
paga_path() plots a heatmap through a node sequence in the PAGA graph A Wolf

0.2.1 2017-07-24

Scanpy includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells. A Wolf, P Angerer

Version 0.1

0.1.0 2017-05-17

Scanpy computationally outperforms and allows reproducing both the Cell Ranger R kit’s and most of Seurat’s clustering workflows. A Wolf, P Angerer