scanpy.tl.rank_genes_groups
- scanpy.tl.rank_genes_groups(adata, groupby, use_raw=None, groups='all', reference='rest', n_genes=None, rankby_abs=False, pts=False, key_added=None, copy=False, method=None, corr_method='benjamini-hochberg', tie_correct=False, layer=None, **kwds)
Rank genes for characterizing groups.
Expects logarithmized data.
- Parameters
- adata :
AnnDataAnnData Annotated data matrix.
- groupby :
strstr The key of the observations grouping to consider.
- use_raw :
bool|NoneOptional[bool] (default:None) Use
rawattribute ofadataif present.- layer :
str|NoneOptional[str] (default:None) Key from
adata.layerswhose value will be used to perform tests on.- groups : {‘all’} |
Iterable[str]Union[Literal[‘all’],Iterable[str]] (default:'all') Subset of groups, e.g. [
'g1','g2','g3'], to which comparison shall be restricted, or'all'(default), for all groups.- reference :
strstr(default:'rest') If
'rest', compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.- n_genes :
int|NoneOptional[int] (default:None) The number of genes that appear in the returned tables. Defaults to all genes.
- method : {‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’} |
NoneOptional[Literal[‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’]] (default:None) The default method is
't-test','t-test_overestim_var'overestimates variance of each group,'wilcoxon'uses Wilcoxon rank-sum,'logreg'uses logistic regression. See [Ntranos18], here and here, for why this is meaningful.- corr_method : {‘benjamini-hochberg’, ‘bonferroni’}
Literal[‘benjamini-hochberg’, ‘bonferroni’] (default:'benjamini-hochberg') p-value correction method. Used only for
't-test','t-test_overestim_var', and'wilcoxon'.- tie_correct :
boolbool(default:False) Use tie correction for
'wilcoxon'scores. Used only for'wilcoxon'.- rankby_abs :
boolbool(default:False) Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.
- pts :
boolbool(default:False) Compute the fraction of cells expressing the genes.
- key_added :
str|NoneOptional[str] (default:None) The key in
adata.unsinformation is saved to.- **kwds
Are passed to test methods. Currently this affects only parameters that are passed to
sklearn.linear_model.LogisticRegression. For instance, you can passpenalty='l1'to try to come up with a minimal set of genes that are good predictors (sparse solution meaning few non-zero fitted coefficients).
- adata :
- Return type
- Returns
- namesstructured
np.ndarray(.uns['rank_genes_groups']) Structured array to be indexed by group id storing the gene names. Ordered according to scores.
- scoresstructured
np.ndarray(.uns['rank_genes_groups']) Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.
- logfoldchangesstructured
np.ndarray(.uns['rank_genes_groups']) Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is ‘t-test’ like. Note: this is an approximation calculated from mean-log values.
- pvalsstructured
np.ndarray(.uns['rank_genes_groups']) p-values.
- pvals_adjstructured
np.ndarray(.uns['rank_genes_groups']) Corrected p-values.
- pts
pandas.DataFrame(.uns['rank_genes_groups']) Fraction of cells expressing the genes for each group.
- pts_rest
pandas.DataFrame(.uns['rank_genes_groups']) Only if
referenceis set to'rest'. Fraction of cells from the union of the rest of each group expressing the genes.
- namesstructured
Notes
There are slight inconsistencies depending on whether sparse or dense data are passed. See here.
Examples
>>> import scanpy as sc >>> adata = sc.datasets.pbmc68k_reduced() >>> sc.tl.rank_genes_groups(adata, 'bulk_labels', method='wilcoxon') >>> # to visualize the results >>> sc.pl.rank_genes_groups(adata)