scanpy.pp.recipe_zheng17

Contents

scanpy.pp.recipe_zheng17#

scanpy.pp.recipe_zheng17(adata, *, n_top_genes=1000, log=True, plot=False, copy=False)[source]#

Normalization and filtering as of Zheng et al. [2017].

Reproduces the preprocessing of Zheng et al. [2017] – the Cell Ranger R Kit of 10x Genomics.

Expects non-logarithmized data. If using logarithmized data, pass log=False.

The recipe runs the following steps

sc.pp.filter_genes(adata, min_counts=1)         # only consider genes with more than 1 count
sc.pp.normalize_per_cell(                       # normalize with total UMI count per cell
     adata, key_n_counts='n_counts_all'
)
filter_result = sc.pp.filter_genes_dispersion(  # select highly-variable genes
    adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False
)
adata = adata[:, filter_result.gene_subset]     # subset the genes
sc.pp.normalize_per_cell(adata)                 # renormalize after filtering
if log: sc.pp.log1p(adata)                      # log transform: adata.X = log(adata.X + 1)
sc.pp.scale(adata)                              # scale to unit variance and shift to zero mean
Parameters:
adata AnnData

Annotated data matrix.

n_top_genes int (default: 1000)

Number of genes to keep.

log bool (default: True)

Take logarithm.

plot bool (default: False)

Show a plot of the gene dispersion vs. mean relation.

copy bool (default: False)

Return a copy of adata instead of updating it.

Return type:

AnnData | None

Returns:

Returns or updates adata depending on copy.