scanpy.tl.dendrogram

scanpy.tl.dendrogram#

scanpy.tl.dendrogram(adata, groupby, *, n_pcs=None, use_rep=None, var_names=None, use_raw=None, cor_method='pearson', linkage_method='complete', optimal_ordering=False, key_added=None, inplace=True)[source]#

Compute a hierarchical clustering for the given groupby categories.

By default, the PCA representation is used unless .X has less than 50 variables.

Alternatively, a list of var_names (e.g. genes) can be given.

Average values of either var_names or components are used to compute a correlation matrix.

The hierarchical clustering can be visualized using scanpy.pl.dendrogram() or multiple other visualizations that can include a dendrogram: matrixplot(), heatmap(), dotplot(), and stacked_violin().

Note

The computation of the hierarchical clustering is based on predefined groups and not per cell. The correlation matrix is computed using by default pearson but other methods are available.

Array type support#
Array type	supported	… experimentally in dask `Array`
`numpy.ndarray`	✅	❌
`scipy.sparse.{csr,csc}_{array,matrix}`	✅	❌

Parameters:

adata AnnData: Annotated data matrix
n_pcs int | None (default: None): Use this many PCs. If n_pcs==0 use .X if use_rep is None.
use_rep str | None (default: None): Use the indicated representation. 'X' or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < N_PCS (default: 50), .X is used, otherwise ‘X_pca’ is used. If ‘X_pca’ is not present, it’s computed with default parameters or n_pcs if present.
var_names Sequence[str] | None (default: None): List of var_names to use for computing the hierarchical clustering. If var_names is given, then use_rep and n_pcs are ignored.
use_raw bool | None (default: None): Only when var_names is not None. Use raw attribute of adata if present.
cor_method str (default: 'pearson'): Correlation method to use. Options are ‘pearson’, ‘kendall’, and ‘spearman’
linkage_method str (default: 'complete'): Linkage method to use. See scipy.cluster.hierarchy.linkage() for more information.
optimal_ordering bool (default: False): Same as the optimal_ordering argument of scipy.cluster.hierarchy.linkage() which reorders the linkage matrix so that the distance between successive leaves is minimal.
key_added str | None (default: None): By default, the dendrogram information is added to .uns[f'dendrogram_{groupby}']. Notice that the groupby information is added to the dendrogram.
inplace bool (default: True): If True, adds dendrogram information to adata.uns[key_added], else this function returns the information.

Return type:

dict[str, Any] | None

Returns:

Returns None if inplace=True, else returns a dict with dendrogram information. Sets the following field if inplace=True:

adata.uns[f'dendrogram_{group_by}' | key_added]dict: Dendrogram information.

Examples

>>> import scanpy as sc
>>> adata = sc.datasets.pbmc68k_reduced()
>>> sc.tl.dendrogram(adata, groupby="bulk_labels")
>>> sc.pl.dendrogram(adata, groupby="bulk_labels")
<Axes: >
>>> markers = ["C1QA", "PSAP", "CD79A", "CD79B", "CST3", "LYZ"]
>>> sc.pl.dotplot(adata, markers, groupby="bulk_labels", dendrogram=True)

scanpy.tl.dendrogram

Contents

scanpy.tl.dendrogram#