scanpy.tl.rank_genes_groups

scanpy.tl.rank_genes_groups(adata, groupby, use_raw=True, groups='all', reference='rest', n_genes=100, rankby_abs=False, key_added=None, copy=False, method='t-test_overestim_var', corr_method='benjamini-hochberg', layer=None, **kwds)

Rank genes for characterizing groups.

Parameters
adata : AnnDataAnnData

Annotated data matrix.

groupby : strstr

The key of the observations grouping to consider.

use_raw : boolbool (default: True)

Use raw attribute of adata if present.

layer : str, NoneOptional[str] (default: None)

Key from adata.layers whose value will be used to perform tests on.

groups : {‘all’}, Iterable[str]Union[Literal[‘all’], Iterable[str]] (default: 'all')

Subset of groups, e.g. ['g1', 'g2', 'g3'], to which comparison shall be restricted, or 'all' (default), for all groups.

reference : strstr (default: 'rest')

If 'rest', compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.

n_genes : intint (default: 100)

The number of genes that appear in the returned tables.

method : {‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’}Literal[‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’] (default: 't-test_overestim_var')

The default ‘t-test_overestim_var’ overestimates variance of each group, 't-test' uses t-test, 'wilcoxon' uses Wilcoxon rank-sum, 'logreg' uses logistic regression. See [Ntranos18], here and here, for why this is meaningful.

corr_method : {‘benjamini-hochberg’, ‘bonferroni’}Literal[‘benjamini-hochberg’, ‘bonferroni’] (default: 'benjamini-hochberg')

p-value correction method. Used only for 't-test', 't-test_overestim_var', and 'wilcoxon'.

rankby_abs : boolbool (default: False)

Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.

key_added : str, NoneOptional[str] (default: None)

The key in adata.uns information is saved to.

**kwds

Are passed to test methods. Currently this affects only parameters that are passed to sklearn.linear_model.LogisticRegression. For instance, you can pass penalty='l1' to try to come up with a minimal set of genes that are good predictors (sparse solution meaning few non-zero fitted coefficients).

Return type

AnnData, NoneOptional[AnnData]

Returns

namesstructured np.ndarray (.uns['rank_genes_groups'])

Structured array to be indexed by group id storing the gene names. Ordered according to scores.

scoresstructured np.ndarray (.uns['rank_genes_groups'])

Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.

logfoldchangesstructured np.ndarray (.uns['rank_genes_groups'])

Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is ‘t-test’ like. Note: this is an approximation calculated from mean-log values.

pvalsstructured np.ndarray (.uns['rank_genes_groups'])

p-values.

pvals_adjstructured np.ndarray (.uns['rank_genes_groups'])

Corrected p-values.

Notes

There are slight inconsistencies depending on whether sparse or dense data are passed. See here.

Examples

>>> import scanpy as sc
>>> adata = sc.datasets.pbmc68k_reduced()
>>> sc.tl.rank_genes_groups(adata, 'bulk_labels', method='wilcoxon')

# to visualize the results >>> sc.pl.rank_genes_groups(adata)