scanpy.pp.highly_variable_genes¶
-
scanpy.pp.
highly_variable_genes
(adata, min_disp=None, max_disp=None, min_mean=None, max_mean=None, n_top_genes=None, n_bins=20, flavor='seurat', subset=False, inplace=True)¶ Annotate highly variable genes [Satija15] [Zheng17].
Expects logarithmized data.
Depending on
flavor
, this reproduces the R-implementations of Seurat [Satija15] and Cell Ranger [Zheng17].The normalized dispersion is obtained by scaling with the mean and standard deviation of the dispersions for genes falling into a given bin for mean expression of genes. This means that for each bin of mean expression, highly variable genes are selected.
Parameters: - adata :
AnnData
The annotated data matrix of shape
n_obs
×n_vars
. Rows correspond to cells and columns to genes.- min_mean :
float
, optional (default: 0.0125) If
n_top_genes
unequalsNone
, this and all other cutoffs for the means and the normalized dispersions are ignored.- max_mean :
float
, optional (default: 3) If
n_top_genes
unequalsNone
, this and all other cutoffs for the means and the normalized dispersions are ignored.- min_disp :
float
, optional (default: 0.5) If
n_top_genes
unequalsNone
, this and all other cutoffs for the means and the normalized dispersions are ignored.- max_disp :
float
, optional (default:None
) If
n_top_genes
unequalsNone
, this and all other cutoffs for the means and the normalized dispersions are ignored.- n_top_genes :
int
orNone
, optional (default:None
) Number of highly-variable genes to keep.
- n_bins :
int
, optional (default: 20) Number of bins for binning the mean gene expression. Normalization is done with respect to each bin. If just a single gene falls into a bin, the normalized dispersion is artificially set to 1. You’ll be informed about this if you set
settings.verbosity = 4
.- flavor :
{'seurat', 'cell_ranger'}
, optional (default: ‘seurat’) Choose the flavor for computing normalized dispersion. In their default workflows, Seurat passes the cutoffs whereas Cell Ranger passes
n_top_genes
.- subset :
bool
, optional (default:False
) Inplace subset to highly-variable genes if
True
otherwise merely indicate highly variable genes.- inplace :
bool
, optional (default:True
) Whether to place calculated metrics in
.var
or return them.
Returns: Depending on
inplace
returns calculated metrics (recarray
) or updates.var
with the following fieldshighly_variable
- boolean indicator of highly-variable genesmeans
- means per genedispersions
- dispersions per genedispersions_norm
- normalized dispersions per gene
Return type: recarray
,None
Notes
This function replaces
filter_genes_dispersion()
.- adata :