scanpy.api.pp.filter_genes_dispersion

scanpy.api.pp.filter_genes_dispersion(data, flavor='seurat', min_disp=None, max_disp=None, min_mean=None, max_mean=None, n_bins=20, n_top_genes=None, log=True, subset=True, copy=False)

Extract highly variable genes [Satija15] [Zheng17].

This is a deprecated function, use highly_variable_genes() instead.

If trying out parameters, pass the data matrix instead of AnnData.

Depending on flavor, this reproduces the R-implementations of Seurat [Satija15] and Cell Ranger [Zheng17].

The normalized dispersion is obtained by scaling with the mean and standard deviation of the dispersions for genes falling into a given bin for mean expression of genes. This means that for each bin of mean expression, highly variable genes are selected.

Use flavor='cell_ranger' with care and in the same way as in recipe_zheng17().

Parameters:
data : AnnData, np.ndarray, sp.sparse

The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

flavor : {'seurat', 'cell_ranger'}, optional (default: 'seurat')

Choose the flavor for computing normalized dispersion. If choosing ‘seurat’, this expects non-logarithmized data - the logarithm of mean and dispersion is taken internally when log is at its default value True. For ‘cell_ranger’, this is usually called for logarithmized data - in this case you should set log to False. In their default workflows, Seurat passes the cutoffs whereas Cell Ranger passes n_top_genes.

max_mean=3, min_disp=0.5, max_disp=`None` : min_mean=0.0125,

If n_top_genes unequals None, these cutoffs for the means and the normalized dispersions are ignored.

n_bins : int (default: 20)

Number of bins for binning the mean gene expression. Normalization is done with respect to each bin. If just a single gene falls into a bin, the normalized dispersion is artificially set to 1. You’ll be informed about this if you set settings.verbosity = 4.

n_top_genes : int or None (default: None)

Number of highly-variable genes to keep.

log : bool, optional (default: True)

Use the logarithm of the mean to variance ratio.

subset : bool, optional (default: True)

Keep highly-variable genes only (if True) else write a bool array for h ighly-variable genes while keeping all genes

copy : bool, optional (default: False)

If an AnnData is passed, determines whether a copy is returned.

Returns:

  • If an AnnData adata is passed, returns or updates adata depending on copy. It filters the adata and adds the annotations
  • means (adata.var) – Means per gene. Logarithmized when log is True.
  • dispersions (adata.var) – Dispersions per gene. Logarithmized when log is True.
  • dispersions_norm (adata.var) – Normalized dispersions per gene. Logarithmized when log is True.
  • If a data matrix `X` is passed, the annotation is returned as `np.recarray` with the same information stored in fields (gene_subset, means, dispersions, dispersion_norm.)