Release Notes¶
Note
Also see the release notes of anndata
.
Version 1.6¶
1.6.1 2021-01-14¶
This is being released alongside 1.7.0rc1
. Check that version out with pip install "scanpy==1.7.0rc1"
.
Bug fixes
Pin
umap-learn
to compatible version
1.6.0 2020-08-15¶
This release includes an overhaul of dotplot()
, matrixplot()
, and stacked_violin()
(PR 1210 F Ramirez), and of the internals of rank_genes_groups()
(PR 1156 S Rybakov).
Overhaul of dotplot()
, matrixplot()
, and stacked_violin()
PR 1210 F Ramirez
An overhauled tutorial → tutorial: plotting/core.
New plotting classes can be accessed directly (e.g.,
DotPlot
) or using thereturn_fig
param.It is possible to plot log fold change and p-values in the
rank_genes_groups_dotplot()
family of functions.Added
ax
parameter which allows embedding the plot in other images.Added option to include a bar plot instead of the dendrogram containing the cell/observation totals per category.
Return a dictionary of axes for further manipulation. This includes the main plot, legend and dendrogram to totals
Legends can be removed.
The
groupby
param can take a list of categories, e.g.,groupby=[‘tissue’, ‘cell type’]
.Added padding parameter to
dotplot
andstacked_violin
. PR 1270Added title for colorbar and positioned as in dotplot for
matrixplot()
.dotplot()
changes:Improved the colorbar and size legend for dotplots. Now the colorbar and size have titles, which can be modified using the
colorbar_title
andsize_title
params. They also align at the bottom of the image and do not shrink if the dotplot image is smaller.Allow plotting genes in rows and categories in columns (
swap_axes
).Using
DotPlot
, thedot_edge_color
and line width can be modified, a grid can be added, and other modifications are enabled.A new style was added in which the dots are replaced by an empty circle and the square behind the circle is colored (like in matrixplots).
stacked_violin()
changes:Violin colors can be colored based on average gene expression as in dotplots.
The linewidth of the violin plots is thinner.
Removed the tics for the y-axis as they tend to overlap with each other. Using the style method they can be displayed if needed.
Additions
concat()
is now exported from scanpy, see Concatenation for more info. PR 1338 I VirshupAdded highly variable gene selection strategy from Seurat v3 PR 1204 A Gayoso
Added
backup_url
param toread_10x_h5()
PR 1296 A GayosoAllow prefix for
read_10x_mtx()
PR 1250 G SturmOptional tie correction for the
'wilcoxon'
method inrank_genes_groups()
PR 1330 S RybakovUse
sinfo
forprint_versions()
and addprint_header()
to do what it previously did. PR 1338 I Virshup PR 1373
Bug fixes
Avoid warning in
rank_genes_groups()
if ‘t-test’ is passed PR 1303 A WolfRestrict sphinx version to <3.1, >3.0 PR 1297 I Virshup
Clean up
_ranks
and fixdendrogram
for scipy 1.5 PR 1290 S RybakovUse
.raw
to translate gene symbols if applicable PR 1278 E RiceFix
diffmap
(issue 1262) G EraslanFix
neighbors
inspring_project
issue 1260 S RybakovFix default size of dot in spatial plots PR 1255 issue 1253 giovp
Bumped version requirement of
scipy
toscipy>1.4
to supportrmatmat
argument ofLinearOperator
issue 1246 I VirshupFix asymmetry of scores for the
'wilcoxon'
method inrank_genes_groups()
issue 754 S RybakovAvoid trimming of gene names in
rank_genes_groups()
issue 753 S Rybakov
Version 1.5¶
1.5.1 2020-05-21¶
Bug fixes
1.5.0 2020-05-15¶
The 1.5.0
release adds a lot of new functionality, much of which takes advantage of anndata
updates 0.7.0 - 0.7.2
. Highlights of this release include support for spatial data, dedicated handling of graphs in AnnData, sparse PCA, an interface with scvi, and others.
Spatial data support
Basic analysis → tutorial: spatial/basic-analysis and integration with single cell data → tutorial: spatial/integration-scanorama G Palla
read_visium()
read 10x Visium data PR 1034 G Palla, P Angerer, I Virshupvisium_sge()
load Visium data directly from 10x Genomics PR 1013 M Mirkazemi, G Palla, P Angerer
New functionality
Many functions, like
neighbors()
andumap()
, now store cell-by-cell graphs inobsp
PR 1118 S Rybakovscale()
andlog1p()
can be used on any element inlayers
orobsm
PR 1173 I Virshup
External tools
Guide for using Scanpy in R PR 1186 L Zappia
Performance
pca()
now uses efficient implicit centering for sparse matrices. This can lead to signifigantly improved performance for large datasets PR 1066 A Tarashanskyscore_genes()
now has an efficient implementation for sparse matrices with missing values PR 1196 redst4r.
Warning
The new pca()
implementation can result in slightly different results for sparse matrices. See the pr (PR 1066) and documentation for more info.
Code design
stacked_violin()
can now be used as a subplot PR 1084 P Angererscore_genes()
has improved logging PR 1119 G Eraslanscale()
now saves mean and standard deviation in thevar
PR 1173 A Wolfharmony_timeseries()
PR 1091 A Mousa
Bug fixes
combat()
now works whenobs_names
aren’t unique. PR 1215 I Virshupscale()
can now be used on dense arrays without centering PR 1160 simonwmregress_out()
now works when some features are constant PR 1194 simonwmnormalize_total()
errored if the passed object was a view PR 1200 I Virshupneighbors()
sometimes ignored then_pcs
param PR 1124 V Bergenebi_expression_atlas()
which contained some out-of-date URLs PR 1102 I Virshuphighly_variable_genes()
which could lead to incorrect results when thebatch_key
argument was used PR 1180 G Eraslaningest()
where an inconsistent number of neighbors was used PR 1111 S Rybakov
Version 1.4¶
1.4.6 2020-03-17¶
Functionality in external
sam()
self-assembling manifolds [Tarashansky19] PR 903 A Tarashanskyharmony_timeseries()
for trajectory inference on discrete time points PR 994 A Mousawishbone()
for trajectory inference (bifurcations) PR 1063 A Mousa
Code design
Bug fixes
1.4.5 2019-12-30¶
Please install scanpy==1.4.5.post3
instead of scanpy==1.4.5
.
New functionality
ingest()
maps labels and embeddings of reference data to new data → tutorial: integrating-data-using-ingest PR 651 S Rybakov, A Wolfqueries
recieved many updates including enrichment through gprofiler and more advanced biomart queries PR 467 I Virshupset_figure_params()
allows settingfigsize
and acceptsfacecolor='white'
, useful for working in dark mode A Wolf
Code design
downsample_counts
now always preserves the dtype of it’s input, instead of converting floats to ints PR 865 I Virshuprun neighbors on a GPU using rapids PR 850 T White
param docs from typed params P Angerer
embedding_density()
now only takes one positional argument; similar forembedding_density()
, which gains a paramgroupby
PR 965 A Wolfwebpage overhaul, ecosystem page, release notes, tutorials overhaul PR 960 PR 966 A Wolf
Warning
changed default
solver
inpca()
fromauto
toarpack
changed default
use_raw
inscore_genes()
fromFalse
toNone
1.4.4 2019-07-20¶
New functionality
scanpy.get
adds helper functions for extracting data in convenient formats PR 619 I Virshup
Bug fixes
Stopped deprecations warnings from AnnData
0.6.22
I Virshup
Code design
normalize_total()
gains paramexclude_highly_expressed
, andfraction
is renamed tomax_fraction
with better docs A Wolf
1.4.3 2019-05-14¶
Bug fixes
neighbors()
correctly infersn_neighbors
again fromparams
, which was temporarily broken inv1.4.2
I Virshup
Code design
calculate_qc_metrics()
is single threaded by default for datasets under 300,000 cells – allowing cached compilation PR 615 I Virshup
1.4.2 2019-05-06¶
New functionality
combat()
supports additional covariates which may include adjustment variables or biological condition PR 618 G Eraslanhighly_variable_genes()
has abatch_key
option which performs HVG selection in each batch separately to avoid selecting genes that vary strongly across batches PR 622 G Eraslan
Bug fixes
rank_genes_groups()
t-test implementation doesn’t return NaN when variance is 0, also changed to scipy’s implementation PR 621 I Virshupumap()
withinit_pos='paga'
detects correctdtype
A Wolflouvain()
andleiden()
auto-generatekey_added=louvain_R
upon passingrestrict_to
, which was temporarily changed in1.4.1
A Wolf
Code design
neighbors()
andumap()
got rid of UMAP legacy code and introduced UMAP as a dependency PR 576 S Rybakov
1.4.1 2019-04-26¶
New functionality
Scanpy has a command line interface again. Invoking it with
scanpy somecommand [args]
callsscanpy-somecommand [args]
, except for builtin commands (currentlyscanpy settings
) PR 604 P Angererebi_expression_atlas()
allows convenient download of EBI expression atlas I Virshupmarker_gene_overlap()
computes overlaps of marker genes M Lueckenfilter_rank_genes_groups()
filters out genes based on fold change and fraction of cells expressing genes F Ramireznormalize_total()
replacesnormalize_per_cell()
, is more efficient and provides a parameter to only normalize using a fraction of expressed genes S Rybakovdownsample_counts()
has been sped up, changed default value ofreplace
parameter toFalse
PR 474 I Virshupembedding_density()
computes densities on embeddings PR 543 M Lueckenpalantir()
interfaces Palantir [Setty18] PR 493 A Mousa
Code design
.layers
support of scatter plots F Ramirezfix double-logarithmization in compute of log fold change in
rank_genes_groups()
A Muñoz-Rojasfix return sections of docs P Angerer
Version 1.3¶
1.3.8 2019-02-05¶
read_10x_h5()
throws more stringent errors and doesn’t require speciying default genomes anymore. PR 442 and PR 444 I Vishrup
1.3.7 2019-01-02¶
Major updates
one can
import scanpy as sc
instead ofimport scanpy.api as sc
, seescanpy
New functionality
combat()
reimplements Combat for batch effect correction [Johnson07] [Leek12], heavily based on the Python implementation of [Pedersen12], but with performance improvements PR 398 M Langephenograph()
wraps the graph clustering package Phenograph [Levine15] A Mousa
1.3.6 2018-12-11¶
Major updates
a new plotting gallery for <no title> F Ramirez
tutorials are integrated on ReadTheDocs, Preprocessing and clustering 3k PBMCs and Trajectory inference for hematopoiesis in mouse A Wolf
Interactive exploration of analysis results through manifold viewers
CZI’s cellxgene directly reads
.h5ad
files the cellxgene developersthe UCSC Single Cell Browser requires exporting via
cellbrowser()
M Haeussler
Code design
highly_variable_genes()
supersedesfilter_genes_dispersion()
, it gives the same results but, by default, expects logarithmized data and doesn’t subset A Wolf
1.3.4 2018-11-24¶
leiden()
wraps the recent graph clustering package by [Traag18] K Polanskibbknn()
wraps the recent batch correction package [Polanski19] K Polanskicalculate_qc_metrics()
caculates a number of quality control metrics, similar tocalculateQCMetrics
from Scater [McCarthy17] I Virshup
1.3.3 2018-11-05¶
Major updates
a fully distributed preprocessing backend T White and the Laserson Lab
Code design
read_10x_h5()
andread_10x_mtx()
read Cell Ranger 3.0 outputs PR 334 Q Gong
Note
Also see changes in anndata 0.6.
changed default compression to
None
inwrite_h5ad()
to speed up read and write, disk space use is usually less criticalperformance gains in
write_h5ad()
due to better handling of strings and categories S Rybakov
1.3.1 2018-09-03¶
RNA velocity in single cells [Manno18]
Scanpy and AnnData support loom’s layers so that computations for single-cell RNA velocity [Manno18] become feasible S Rybakov and V Bergen
scvelo harmonizes with Scanpy and is able to process loom files with splicing information produced by Velocyto [Manno18], it runs a lot faster than the count matrix analysis of Velocyto and provides several conceptual developments
Plotting (Generic)
dotplot()
for visualizing genes across conditions and clusters, see here PR 199 F Ramirezviolin()
produces very compact overview figures with many panels PR 175 F Ramirez
There now is a section on imputation in external:
magic()
for imputation using data diffusion [vanDijk18] PR 187 S Gigantedca()
for imputation and latent space construction using an autoencoder [Eraslan18] PR 186 G Eraslan
Version 1.2¶
1.2.1 2018-06-08¶
Plotting of Generic marker genes and quality control.
highest_expr_genes()
for quality control; plot genes with highest mean fraction of cells, similar toplotQC
of Scater [McCarthy17] PR 169 F Ramirez
Version 1.1¶
1.1.0 2018-06-01¶
set_figure_params()
by default passesvector_friendly=True
and allows you to produce reasonablly sized pdfs by rasterizing large scatter plots A Wolfdraw_graph()
defaults to the ForceAtlas2 layout [Jacomy14] [Chippada18], which is often more visually appealing and whose computation is much faster S Wollockscatter()
also plots along variables axis MD Lueckenregress_out()
is back to multiprocessing F Ramirezread()
reads compressed text files G Eraslanmitochondrial_genes()
for querying mito genes FG Brundumnn_correct()
for batch correction [Haghverdi18] [Kang18]sandbag()
,cyclone()
for scoring genes [Scialdone15] [Fechtner18]
Version 1.0¶
1.0.0 2018-03-30¶
Major updates
Scanpy is much faster and more memory efficient: preprocess, cluster and visualize 1.3M cells in 6h, 130K cells in 14min, and 68K cells in 3min A Wolf
the API gained a preprocessing function
neighbors()
and a classNeighbors()
to which all basic graph computations are delegated A Wolf
Warning
Upgrading to 1.0 isn’t fully backwards compatible in the following changes
the graph-based tools
louvain()
dpt()
draw_graph()
umap()
diffmap()
paga()
require prior computation of the graph:sc.pp.neighbors(adata, n_neighbors=5); sc.tl.louvain(adata)
instead of previouslysc.tl.louvain(adata, n_neighbors=5)
install
numba
viaconda install numba
, which replaces cythonthe default connectivity measure (dpt will look different using default settings) changed. setting
method='gauss'
insc.pp.neighbors
uses gauss kernel connectivities and reproduces the previous behavior, see, for instance in the example paul15.namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore
replace occurances of
group_by
withgroupby
(consistency withpandas
)it is worth checking out the notebook examples to see changes, e.g. the seurat example.
upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different
Further updates
UMAP [McInnes18] can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is also used for measuring connectivities and computing neighbors, see
neighbors()
A Wolfgraph abstraction: AGA is renamed to PAGA:
paga()
; now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately vialouvain()
anddpt()
, the connectivity measure has been improved A Wolflogistic regression for finding marker genes
rank_genes_groups()
with parametermethod='logreg'
A Wolflouvain()
provides a better implementation for reclustering viarestrict_to
A Wolfscanpy no longer modifies rcParams upon import, call
settings.set_figure_params
to set the ‘scanpy style’ A Wolfdefault cache directory is
./cache/
, setsettings.cachedir
to change this; nested directories in this are avoided A Wolfshow edges in scatter plots based on graph visualization
draw_graph()
andumap()
by passingedges=True
A Wolfdownsample_counts()
for downsampling counts MD Lueckendefault
'louvain_groups'
are called'louvain'
A Wolf'X_diffmap'
contains the zero component, plotting remains unchanged A Wolf
Version 0.4¶
0.4.4 2018-02-26¶
embed cells using
umap()
[McInnes18] PR 92 G Eraslanscore sets of genes, e.g. for cell cycle, using
score_genes()
[Satija15] D Cittaro
0.4.3 2018-02-09¶
clustermap()
: heatmap from hierarchical clustering, based onseaborn.clustermap()
[Waskom16] A Wolfonly return
matplotlib.axes.Axes
in plotting functions ofsc.pl
whenshow=False
, otherwiseNone
A Wolf
0.4.0 2017-12-23¶
export to SPRING [Weinreb17] for interactive visualization of data: spring tutorial S Wollock
Version 0.3¶
0.3.2 2017-11-29¶
finding marker genes via
rank_genes_groups_violin()
improved, see issue 51 F Ramirez
Version 0.2¶
0.2.9 2017-10-25¶
Initial release of the new trajectory inference method PAGA
paga()
computes an abstracted, coarse-grained (PAGA) graph of the neighborhood graph A Wolfpaga_compare()
plot this graph next an embedding A Wolfpaga_path()
plots a heatmap through a node sequence in the PAGA graph A Wolf
0.2.1 2017-07-24¶
Scanpy includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells. A Wolf, P Angerer
Version 0.1¶
0.1.0 2017-05-17¶
Scanpy computationally outperforms and allows reproducing both the Cell Ranger R kit’s and most of Seurat’s clustering workflows. A Wolf, P Angerer