scanpy.tl.rank_genes_groups

scanpy.tl.rank_genes_groups(adata, groupby, use_raw=True, groups='all', reference='rest', n_genes=100, rankby_abs=False, key_added=None, copy=False, method='t-test_overestim_var', corr_method='benjamini-hochberg', **kwds)

Rank genes for characterizing groups.

Parameters
adata : AnnData

Annotated data matrix.

groupby : str

The key of the observations grouping to consider.

use_raw : bool, optional (default: True)

Use raw attribute of adata if present.

groups : str, Iterable[str]Union[str, Iterable[str]]

Subset of groups, e.g. ['g1', 'g2', 'g3'], to which comparison shall be restricted, or 'all' (default), for all groups.

reference : str, optional (default: 'rest')

If 'rest', compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.

n_genes : int, optional (default: 100)

The number of genes that appear in the returned tables.

method : {'logreg', 't-test', 'wilcoxon', 't-test_overestim_var'}, optional (default: ‘t-test_overestim_var’)

If ‘t-test’, uses t-test, if ‘wilcoxon’, uses Wilcoxon-Rank-Sum. If ‘t-test_overestim_var’, overestimates variance of each group. If ‘logreg’ uses logistic regression, see [Ntranos18], here and here, for why this is meaningful.

corr_method : {'benjamini-hochberg', 'bonferroni'}, optional (default: ‘benjamini-hochberg’)

p-value correction method. Used only for ‘t-test’, ‘t-test_overestim_var’, and ‘wilcoxon’ methods.

rankby_abs : bool, optional (default: False)

Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.

**kwds : keyword parameters

Are passed to test methods. Currently this affects only parameters that are passed to sklearn.linear_model.LogisticRegression. For instance, you can pass penalty='l1' to try to come up with a minimal set of genes that are good predictors (sparse solution meaning few non-zero fitted coefficients).

Returns

namesstructured np.ndarray (.uns['rank_genes_groups'])

Structured array to be indexed by group id storing the gene names. Ordered according to scores.

scoresstructured np.ndarray (.uns['rank_genes_groups'])

Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.

logfoldchangesstructured np.ndarray (.uns['rank_genes_groups'])

Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is ‘t-test’ like. Note: this is an approximation calculated from mean-log values.

pvalsstructured np.ndarray (.uns['rank_genes_groups'])

p-values.

pvals_adjstructured np.ndarray (.uns['rank_genes_groups'])

Corrected p-values.

Notes

There are slight inconsistencies depending on whether sparse or dense data are passed. See here.

Examples

>>> adata = sc.datasets.pbmc68k_reduced()
>>> sc.tl.rank_genes_groups(adata, 'bulk_labels', method='wilcoxon')

# to visualize the results >>> sc.pl.rank_genes_groups(adata)