Preprocessing: pp#

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing#

For visual quality control, see highest_expr_genes() and filter_genes_dispersion() in scanpy.pl.

pp.calculate_qc_metrics

Calculate quality control metrics.

pp.filter_cells

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_genes

Filter genes based on number of cells or counts.

pp.highly_variable_genes

Annotate highly variable genes [Satija15] [Zheng17] [Stuart19].

pp.log1p

Logarithmize the data matrix.

pp.pca

Principal component analysis [Pedregosa11].

pp.normalize_total

Normalize counts per cell.

pp.regress_out

Regress out (mostly) unwanted sources of variation.

pp.scale

Scale data to unit variance and zero mean.

pp.subsample

Subsample to a fraction of the number of observations.

pp.downsample_counts

Downsample counts from count matrix.

Recipes#

pp.recipe_zheng17

Normalization and filtering as of [Zheng17].

pp.recipe_weinreb17

Normalization and filtering as of [Weinreb17].

pp.recipe_seurat

Normalization and filtering as of Seurat [Satija15].

Batch effect correction#

Also see [Data integration]. Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.

pp.combat

ComBat function for batch effect correction [Johnson07] [Leek12] [Pedersen12].

Doublet detection#

pp.scrublet

Predict doublets using Scrublet [Wolock19].

pp.scrublet_simulate_doublets

Simulate doublets by adding the counts of random observed transcriptome pairs.

Neighbors#

pp.neighbors

Computes the nearest neighbors distance matrix and a neighborhood graph of observations [McInnes18].