Preprocessing: pp#

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing#

For visual quality control, see highest_expr_genes() and highly_variable_genes() in scanpy.pl.

pp.calculate_qc_metrics

Calculate quality control metrics.

pp.filter_cells

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_genes

Filter genes based on number of cells or counts.

pp.highly_variable_genes

Annotate highly variable genes [Satija et al., 2015, Stuart et al., 2019, Zheng et al., 2017].

pp.log1p

Logarithmize the data matrix.

pp.pca

Principal component analysis [Pedregosa et al., 2011].

pp.normalize_total

Normalize counts per cell.

pp.regress_out

Regress out (mostly) unwanted sources of variation.

pp.scale

Scale data to unit variance and zero mean.

pp.sample

pp.downsample_counts

Downsample counts from count matrix.

Recipes#

pp.recipe_zheng17

Normalize and filter as of Zheng et al. [2017].

pp.recipe_weinreb17

Normalize and filter as of [Weinreb et al., 2017].

pp.recipe_seurat

Normalize and filter as of Seurat [Satija et al., 2015].

Data integration#

Batch effect correction and other data integration. Note that a simple batch correction method is available via pp.regress_out().

pp.combat

ComBat function for batch effect correction [Johnson et al., 2006, Leek et al., 2017, Pedersen, 2012].

pp.harmony_integrate

Integrate different experiments using the Harmony algorithm [Korsunsky et al., 2019, Patikas et al., 2026].

Also see data integration tools and external external data integration.

Doublet detection#

pp.scrublet

Predict doublets using Scrublet [Wolock et al., 2019].

pp.scrublet_simulate_doublets

Simulate doublets by adding the counts of random observed transcriptome pairs.

Neighbors#

pp.neighbors

Compute the nearest neighbors distance matrix and a neighborhood graph of observations [McInnes et al., 2018].