scanpy.tl.rank_genes_groups#
- scanpy.tl.rank_genes_groups(adata, groupby, *, mask_var=None, use_raw=None, groups='all', reference='rest', n_genes=None, rankby_abs=False, pts=False, key_added=None, copy=False, method=None, corr_method='benjamini-hochberg', tie_correct=False, layer=None, **kwds)[source]#
Rank genes for characterizing groups.
Expects logarithmized data.
- Parameters:
- adata
AnnData
Annotated data matrix.
- groupby
str
The key of the observations grouping to consider.
- mask_var
ndarray
[Any
,dtype
[bool
]] |str
|None
(default:None
) Select subset of genes to use in statistical tests.
- use_raw
bool
|None
(default:None
) Use
raw
attribute ofadata
if present.- layer
str
|None
(default:None
) Key from
adata.layers
whose value will be used to perform tests on.- groups
Union
[Literal
['all'
],Iterable
[str
]] (default:'all'
) Subset of groups, e.g. [
'g1'
,'g2'
,'g3'
], to which comparison shall be restricted, or'all'
(default), for all groups. Note that ifreference='rest'
all groups will still be used as the reference, not just those specified ingroups
.- reference
str
(default:'rest'
) If
'rest'
, compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.- n_genes
int
|None
(default:None
) The number of genes that appear in the returned tables. Defaults to all genes.
- method
Optional
[Literal
['logreg'
,'t-test'
,'wilcoxon'
,'t-test_overestim_var'
]] (default:None
) The default method is
't-test'
,'t-test_overestim_var'
overestimates variance of each group,'wilcoxon'
uses Wilcoxon rank-sum,'logreg'
uses logistic regression. See Ntranos et al. [2019], here and here, for why this is meaningful.- corr_method
Literal
['benjamini-hochberg'
,'bonferroni'
] (default:'benjamini-hochberg'
) p-value correction method. Used only for
't-test'
,'t-test_overestim_var'
, and'wilcoxon'
.- tie_correct
bool
(default:False
) Use tie correction for
'wilcoxon'
scores. Used only for'wilcoxon'
.- rankby_abs
bool
(default:False
) Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.
- pts
bool
(default:False
) Compute the fraction of cells expressing the genes.
- key_added
str
|None
(default:None
) The key in
adata.uns
information is saved to.- copy
bool
(default:False
) Whether to copy
adata
or modify it inplace.- kwds
Are passed to test methods. Currently this affects only parameters that are passed to
sklearn.linear_model.LogisticRegression
. For instance, you can passpenalty='l1'
to try to come up with a minimal set of genes that are good predictors (sparse solution meaning few non-zero fitted coefficients).
- adata
- Return type:
- Returns:
Returns
None
ifcopy=False
, else returns anAnnData
object. Sets the following fields:adata.uns['rank_genes_groups' | key_added]['names']
structurednumpy.ndarray
(dtypeobject
)Structured array to be indexed by group id storing the gene names. Ordered according to scores.
adata.uns['rank_genes_groups' | key_added]['scores']
structurednumpy.ndarray
(dtypeobject
)Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.
adata.uns['rank_genes_groups' | key_added]['logfoldchanges']
structurednumpy.ndarray
(dtypeobject
)Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is ‘t-test’ like. Note: this is an approximation calculated from mean-log values.
adata.uns['rank_genes_groups' | key_added]['pvals']
structurednumpy.ndarray
(dtypefloat
)p-values.
adata.uns['rank_genes_groups' | key_added]['pvals_adj']
structurednumpy.ndarray
(dtypefloat
)Corrected p-values.
adata.uns['rank_genes_groups' | key_added]['pts']
pandas.DataFrame
(dtypefloat
)Fraction of cells expressing the genes for each group.
adata.uns['rank_genes_groups' | key_added]['pts_rest']
pandas.DataFrame
(dtypefloat
)Only if
reference
is set to'rest'
. Fraction of cells from the union of the rest of each group expressing the genes.
Notes
There are slight inconsistencies depending on whether sparse or dense data are passed. See here.
Examples
>>> import scanpy as sc >>> adata = sc.datasets.pbmc68k_reduced() >>> sc.tl.rank_genes_groups(adata, 'bulk_labels', method='wilcoxon') >>> # to visualize the results >>> sc.pl.rank_genes_groups(adata)