scanpy.api.pp.highly_variable_genes

scanpy.api.pp.highly_variable_genes(adata, min_disp=None, max_disp=None, min_mean=None, max_mean=None, n_top_genes=None, n_bins=20, flavor='seurat', subset=False, inplace=True)

Annotate highly variable genes [Satija15] [Zheng17].

Expects logarithmized data.

Depending on flavor, this reproduces the R-implementations of Seurat [Satija15] and Cell Ranger [Zheng17].

The normalized dispersion is obtained by scaling with the mean and standard deviation of the dispersions for genes falling into a given bin for mean expression of genes. This means that for each bin of mean expression, highly variable genes are selected.

Parameters:
adata : AnnData

The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

min_mean : float, optional (default: 0.0125)

If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.

max_mean : float, optional (default: 3)

If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.

min_disp : float, optional (default: 0.5)

If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.

max_disp : float, optional (default: None)

If n_top_genes unequals None, this and all other cutoffs for the means and the normalized dispersions are ignored.

n_top_genes : int or None, optional (default: None)

Number of highly-variable genes to keep.

n_bins : int, optional (default: 20)

Number of bins for binning the mean gene expression. Normalization is done with respect to each bin. If just a single gene falls into a bin, the normalized dispersion is artificially set to 1. You’ll be informed about this if you set settings.verbosity = 4.

flavor : {'seurat', 'cell_ranger'}, optional (default: ‘seurat’)

Choose the flavor for computing normalized dispersion. In their default workflows, Seurat passes the cutoffs whereas Cell Ranger passes n_top_genes.

subset : bool, optional (default: False)

Inplace subset to highly-variable genes if True otherwise merely indicate highly variable genes.

inplace : bool, optional (default: True)

Whether to place calculated metrics in .var or return them.

Returns:

Depending on inplace returns calculated metrics (recarray) or updates .var with the following fields

  • highly_variable - boolean indicator of highly-variable genes
  • means - means per gene
  • dispersions - dispersions per gene
  • dispersions_norm - normalized dispersions per gene

Return type:

recarray, None

Notes

This function replaces filter_genes_dispersion().