scanpy.external.pp.scrublet_simulate_doublets(adata, layer=None, sim_doublet_ratio=2.0, synthetic_doublet_umi_subsampling=1.0, random_seed=0)

Simulate doublets by adding the counts of random observed transcriptome pairs.

adata : AnnDataAnnData

The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes. Genes should have been filtered for expression and variability, and the object should contain raw expression of the same dimensions.


Layer of adata where raw values are stored, or ‘X’ if values are in .X.

sim_doublet_ratio : floatfloat (default: 2.0)

Number of doublets to simulate relative to the number of observed transcriptomes. If None, self.sim_doublet_ratio is used.

synthetic_doublet_umi_subsampling : floatfloat (default: 1.0)

Rate for sampling UMIs when creating synthetic doublets. If 1.0, each doublet is created by simply adding the UMIs from two randomly sampled observed transcriptomes. For values less than 1, the UMI counts are added and then randomly sampled at the specified rate.

Return type



adata : anndata.AnnData with simulated doublets in .X if copy=True it returns or else adds fields to adata:


Pairs of .obs_names used to generate each simulated doublet transcriptome


Dictionary of Scrublet parameters

See also


Main way of running Scrublet, runs preprocessing, doublet simulation (this function) and calling.


Plot histogram of doublet scores for observed transcriptomes and simulated doublets.