Preprocessing: pp

Preprocessing: `pp`#

Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing#

For visual quality control, see highest_expr_genes() and filter_genes_dispersion() in scanpy.pl.

`pp.calculate_qc_metrics`	Calculate quality control metrics.
`pp.filter_cells`	Filter cell outliers based on counts and numbers of genes expressed.
`pp.filter_genes`	Filter genes based on number of cells or counts.
`pp.highly_variable_genes`	Annotate highly variable genes [Satija et al., 2015, Stuart et al., 2019, Zheng et al., 2017].
`pp.log1p`	Logarithmize the data matrix.
`pp.pca`	Principal component analysis [Pedregosa et al., 2011].
`pp.normalize_total`	Normalize counts per cell.
`pp.regress_out`	Regress out (mostly) unwanted sources of variation.
`pp.scale`	Scale data to unit variance and zero mean.
`pp.sample`	Sample observations or variables with or without replacement.
`pp.downsample_counts`	Downsample counts from count matrix.

Recipes#

`pp.recipe_zheng17`	Normalize and filter as of Zheng et al. [2017].
`pp.recipe_weinreb17`	Normalize and filter as of [Weinreb et al., 2017].
`pp.recipe_seurat`	Normalize and filter as of Seurat [Satija et al., 2015].

Batch effect correction#

Also see Data integration. Note that a simple batch correction method is available via pp.regress_out(). Checkout scanpy.external for more.

pp.combat

ComBat function for batch effect correction [Johnson et al., 2006, Leek et al., 2017, Pedersen, 2012].

Doublet detection#

`pp.scrublet`	Predict doublets using Scrublet [Wolock et al., 2019].
`pp.scrublet_simulate_doublets`	Simulate doublets by adding the counts of random observed transcriptome pairs.

Neighbors#

pp.neighbors

Compute the nearest neighbors distance matrix and a neighborhood graph of observations [McInnes et al., 2018].