scanpy.external.pp.hashsolo

Contents

scanpy.external.pp.hashsolo#

scanpy.external.pp.hashsolo(adata, cell_hashing_columns, *, priors=(0.01, 0.8, 0.19), pre_existing_clusters=None, number_of_noise_barcodes=None, inplace=True)[source]#

Probabilistic demultiplexing of cell hashing data using HashSolo [Bernstein et al., 2020].

Note

More information and bug reports here.

Parameters:
adata AnnData

The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

cell_hashing_columns Sequence[str]

.obs columns that contain cell hashing counts.

priors tuple[float, float, float] (default: (0.01, 0.8, 0.19))

Prior probabilities of each hypothesis, in the order [negative, singlet, doublet]. The default is set to [0.01, 0.8, 0.19] assuming barcode counts are from cells that have passed QC in the transcriptome space, e.g. UMI counts, pct mito reads, etc.

pre_existing_clusters str | None (default: None)

The column in .obs containing pre-existing cluster assignments (e.g. Leiden clusters or cell types, but not batch assignments). If provided, demultiplexing will be performed separately for each cluster.

number_of_noise_barcodes int | None (default: None)

The number of barcodes used to create the noise distribution. Defaults to len(cell_hashing_columns) - 2.

inplace bool (default: True)

Whether to update adata in-place or return a copy.

Return type:

AnnData | None

Returns:

A copy of the input adata if inplace=False, otherwise the input adata. The following fields are added:

.obs["most_likely_hypothesis"]

Index of the most likely hypothesis, where 0 corresponds to negative, 1 to singlet, and 2 to doublet.

.obs["cluster_feature"]

The cluster assignments used for demultiplexing.

.obs["negative_hypothesis_probability"]

Probability of the negative hypothesis.

.obs["singlet_hypothesis_probability"]

Probability of the singlet hypothesis.

.obs["doublet_hypothesis_probability"]

Probability of the doublet hypothesis.

.obs["Classification"]:

Classification of the cell, one of the barcodes in cell_hashing_columns, "Negative", or "Doublet".

Examples

>>> import anndata
>>> import scanpy.external as sce
>>> adata = anndata.read_h5ad("data.h5ad")
>>> sce.pp.hashsolo(adata, ["Hash1", "Hash2", "Hash3"])
>>> adata.obs.head()