scanpy.pp.recipe_zheng17

scanpy.pp.recipe_zheng17#

scanpy.pp.recipe_zheng17(adata, *, n_top_genes=1000, log=True, plot=False, copy=False)[source]#

Normalize and filter as of Zheng et al. [2017].

Reproduces the preprocessing of Zheng et al. [2017] – the Cell Ranger R Kit of 10x Genomics.

Expects non-logarithmized data. If using logarithmized data, pass log=False.

The recipe runs the following steps

sc.pp.filter_genes(adata, min_counts=1)         # only consider genes with more than 1 count
sc.pp.normalize_per_cell(                       # normalize with total UMI count per cell
     adata, key_n_counts='n_counts_all'
)
filter_result = sc.pp.filter_genes_dispersion(  # select highly-variable genes
    adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False
)
adata = adata[:, filter_result.gene_subset]     # subset the genes
sc.pp.normalize_per_cell(adata)                 # renormalize after filtering
if log: sc.pp.log1p(adata)                      # log transform: adata.X = log(adata.X + 1)
sc.pp.scale(adata)                              # scale to unit variance and shift to zero mean

Parameters:

adata AnnData: Annotated data matrix.
n_top_genes int (default: 1000): Number of genes to keep.
log bool (default: True): Take logarithm.
plot bool (default: False): Show a plot of the gene dispersion vs. mean relation.
copy bool (default: False): Return a copy of adata instead of updating it.

Return type:

AnnData | None

Returns:

Returns or updates adata depending on copy.

scanpy.pp.recipe_zheng17

Contents

scanpy.pp.recipe_zheng17#