Preprocessing: pp#

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing#

For visual quality control, see highest_expr_genes() and filter_genes_dispersion() in scanpy.pl.

pp.calculate_qc_metrics

Calculate quality control metrics.

pp.filter_cells

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_genes

Filter genes based on number of cells or counts.

pp.highly_variable_genes

Annotate highly variable genes [Satija et al., 2015, Stuart et al., 2019, Zheng et al., 2017].

pp.log1p

Logarithmize the data matrix.

pp.pca

Principal component analysis [Pedregosa et al., 2011].

pp.normalize_total

Normalize counts per cell.

pp.regress_out

Regress out (mostly) unwanted sources of variation.

pp.scale

Scale data to unit variance and zero mean.

pp.sample

Sample observations or variables with or without replacement.

pp.downsample_counts

Downsample counts from count matrix.

Recipes#

pp.recipe_zheng17

Normalization and filtering as of Zheng et al. [2017].

pp.recipe_weinreb17

Normalization and filtering as of [Weinreb et al., 2017].

pp.recipe_seurat

Normalization and filtering as of Seurat [Satija et al., 2015].

Batch effect correction#

Also see Data integration. Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.

pp.combat

ComBat function for batch effect correction [Johnson et al., 2006, Leek et al., 2017, Pedersen, 2012].

Doublet detection#

pp.scrublet

Predict doublets using Scrublet [Wolock et al., 2019].

pp.scrublet_simulate_doublets

Simulate doublets by adding the counts of random observed transcriptome pairs.

Neighbors#

pp.neighbors

Computes the nearest neighbors distance matrix and a neighborhood graph of observations [McInnes et al., 2018].