Preprocessing: pp
#
Filtering of highly-variable genes, batch-effect correction, per-cell normalization, preprocessing recipes.
Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.
Basic Preprocessing#
For visual quality control, see highest_expr_genes()
and
filter_genes_dispersion()
in scanpy.pl
.
Calculate quality control metrics. |
|
Filter cell outliers based on counts and numbers of genes expressed. |
|
Filter genes based on number of cells or counts. |
|
Annotate highly variable genes [Satija et al., 2015, Stuart et al., 2019, Zheng et al., 2017]. |
|
Logarithmize the data matrix. |
|
Principal component analysis [Pedregosa et al., 2011]. |
|
Normalize counts per cell. |
|
Regress out (mostly) unwanted sources of variation. |
|
Scale data to unit variance and zero mean. |
|
Subsample to a fraction of the number of observations. |
|
Downsample counts from count matrix. |
Recipes#
Normalization and filtering as of Zheng et al. [2017]. |
|
Normalization and filtering as of [Weinreb et al., 2017]. |
|
Normalization and filtering as of Seurat [Satija et al., 2015]. |
Batch effect correction#
Also see [Data integration]. Note that a simple batch correction method is available via pp.regress_out()
. Checkout scanpy.external
for more.
ComBat function for batch effect correction [Johnson et al., 2006, Leek et al., 2017, Pedersen, 2012]. |
Doublet detection#
Predict doublets using Scrublet [Wolock et al., 2019]. |
|
Simulate doublets by adding the counts of random observed transcriptome pairs. |
Neighbors#
Computes the nearest neighbors distance matrix and a neighborhood graph of observations [McInnes et al., 2018]. |