scanpy.external.pp.hashsolo

scanpy.external.pp.hashsolo#

scanpy.external.pp.hashsolo(adata, cell_hashing_columns, *, priors=(0.01, 0.8, 0.19), pre_existing_clusters=None, number_of_noise_barcodes=None, inplace=True)[source]#

Probabilistic demultiplexing of cell hashing data using HashSolo [Bernstein et al., 2020].

Note

More information and bug reports here.

Parameters:

adata AnnData: The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.
cell_hashing_columns Sequence[str]: .obs columns that contain cell hashing counts.
priors tuple[float, float, float] (default: (0.01, 0.8, 0.19)): Prior probabilities of each hypothesis, in the order [negative, singlet, doublet]. The default is set to [0.01, 0.8, 0.19] assuming barcode counts are from cells that have passed QC in the transcriptome space, e.g. UMI counts, pct mito reads, etc.
pre_existing_clusters str | None (default: None): The column in .obs containing pre-existing cluster assignments (e.g. Leiden clusters or cell types, but not batch assignments). If provided, demultiplexing will be performed separately for each cluster.
number_of_noise_barcodes int | None (default: None): The number of barcodes used to create the noise distribution. Defaults to len(cell_hashing_columns) - 2.
inplace bool (default: True): Whether to update adata in-place or return a copy.

Return type:

AnnData | None

Returns:

A copy of the input adata if inplace=False, otherwise the input adata. The following fields are added:

.obs["most_likely_hypothesis"]: Index of the most likely hypothesis, where 0 corresponds to negative, 1 to singlet, and 2 to doublet.
.obs["cluster_feature"]: The cluster assignments used for demultiplexing.
.obs["negative_hypothesis_probability"]: Probability of the negative hypothesis.
.obs["singlet_hypothesis_probability"]: Probability of the singlet hypothesis.
.obs["doublet_hypothesis_probability"]: Probability of the doublet hypothesis.
.obs["Classification"]:: Classification of the cell, one of the barcodes in cell_hashing_columns, "Negative", or "Doublet".

Examples

>>> import anndata
>>> import scanpy.external as sce
>>> adata = anndata.read_h5ad("data.h5ad")
>>> sce.pp.hashsolo(adata, ["Hash1", "Hash2", "Hash3"])
>>> adata.obs.head()

scanpy.external.pp.hashsolo

Contents

scanpy.external.pp.hashsolo#