scanpy.external.pp.hashsolo(adata, cell_hashing_columns, *, priors=(0.01, 0.8, 0.19), pre_existing_clusters=None, number_of_noise_barcodes=None, inplace=True)[source]#

Probabilistic demultiplexing of cell hashing data using HashSolo [Bernstein et al., 2020].


More information and bug reports here.

adata AnnData

The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

cell_hashing_columns Sequence[str]

.obs columns that contain cell hashing counts.

priors tuple[float, float, float] (default: (0.01, 0.8, 0.19))

Prior probabilities of each hypothesis, in the order [negative, singlet, doublet]. The default is set to [0.01, 0.8, 0.19] assuming barcode counts are from cells that have passed QC in the transcriptome space, e.g. UMI counts, pct mito reads, etc.

pre_existing_clusters str | None (default: None)

The column in .obs containing pre-existing cluster assignments (e.g. Leiden clusters or cell types, but not batch assignments). If provided, demultiplexing will be performed separately for each cluster.

number_of_noise_barcodes int | None (default: None)

The number of barcodes used to create the noise distribution. Defaults to len(cell_hashing_columns) - 2.

inplace bool (default: True)

Whether to update adata in-place or return a copy.

Return type:

AnnData | None


A copy of the input adata if inplace=False, otherwise the input adata. The following fields are added:


Index of the most likely hypothesis, where 0 corresponds to negative, 1 to singlet, and 2 to doublet.


The cluster assignments used for demultiplexing.


Probability of the negative hypothesis.


Probability of the singlet hypothesis.


Probability of the doublet hypothesis.


Classification of the cell, one of the barcodes in cell_hashing_columns, "Negative", or "Doublet".


>>> import anndata
>>> import scanpy.external as sce
>>> adata = anndata.read_h5ad("data.h5ad")
>>> sce.pp.hashsolo(adata, ["Hash1", "Hash2", "Hash3"])
>>> adata.obs.head()