, groupby, n_pcs=None, use_rep=None, var_names=None, use_raw=None, cor_method='pearson', linkage_method='complete', optimal_ordering=False, key_added=None, inplace=True)

Computes a hierarchical clustering for the given groupby categories.

By default, the PCA representation is used unless .X has less than 50 variables.

Alternatively, a list of var_names (e.g. genes) can be given.

Average values of either var_names or components are used to compute a correlation matrix.

The hierarchical clustering can be visualized using or multiple other visualizations that can include a dendrogram: matrixplot(), heatmap(), dotplot(), and stacked_violin().


The computation of the hierarchical clustering is based on predefined groups and not per cell. The correlation matrix is computed using by default pearson but other methods are available.

adata : AnnDataAnnData

Annotated data matrix

n_pcs : int | NoneOptional[int] (default: None)

Use this many PCs. If n_pcs==0 use .X if use_rep is None.

use_rep : str | NoneOptional[str] (default: None)

Use the indicated representation. 'X' or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise ‘X_pca’ is used. If ‘X_pca’ is not present, it’s computed with default parameters.

var_names : Sequence[str] | NoneOptional[Sequence[str]] (default: None)

List of var_names to use for computing the hierarchical clustering. If var_names is given, then use_rep and n_pcs is ignored.

use_raw : bool | NoneOptional[bool] (default: None)

Only when var_names is not None. Use raw attribute of adata if present.

cor_method : strstr (default: 'pearson')

correlation method to use. Options are ‘pearson’, ‘kendall’, and ‘spearman’

linkage_method : strstr (default: 'complete')

linkage method to use. See scipy.cluster.hierarchy.linkage() for more information.

optimal_ordering : boolbool (default: False)

Same as the optimal_ordering argument of scipy.cluster.hierarchy.linkage() which reorders the linkage matrix so that the distance between successive leaves is minimal.

key_added : str | NoneOptional[str] (default: None)

By default, the dendrogram information is added to .uns[f'dendrogram_{groupby}']. Notice that the groupby information is added to the dendrogram.

inplace : boolbool (default: True)

If True, adds dendrogram information to adata.uns[key_added], else this function returns the information.

Return type

{str: Any} | NoneOptional[Dict[str, Any]]


If inplace=False, returns dendrogram information, else adata.uns[key_added] is updated with it.


>>> import scanpy as sc
>>> adata = sc.datasets.pbmc68k_reduced()
>>>, groupby='bulk_labels')
>>> markers = ['C1QA', 'PSAP', 'CD79A', 'CD79B', 'CST3', 'LYZ']
>>>, markers, groupby='bulk_labels', dendrogram=True)