scanpy.pp.harmony_integrate#
- scanpy.pp.harmony_integrate(adata, key, *, basis='X_pca', adjusted_basis='X_pca_harmony', dtype=<class 'numpy.float64'>, flavor='harmony2', n_clusters=None, max_iter_harmony=10, max_iter_clustering=200, tol_harmony=0.0001, tol_clustering=1e-05, sigma=0.1, theta=2.0, tau=0, ridge_lambda=1.0, alpha=0.2, batch_prune_threshold=1e-05, correction_method='fast', block_proportion=0.05, rng=None)[source]#
Integrate different experiments using the Harmony algorithm [Korsunsky et al., 2019, Patikas et al., 2026].
This CPU implementation is based on the harmony-pytorch & rapids_singlecell version, using NumPy for efficient computation. As Harmony works by adjusting the principal components, this function should be run after performing PCA but before computing the neighbor graph.
By default, the Harmony2 algorithm is used, which includes a stabilized diversity penalty, dynamic per-cluster-per-batch ridge regularization, and automatic batch pruning. To revert to the original Harmony behavior:
sc.pp.harmony_integrate(adata, key, flavor="harmony1")
Array type support# Array type
supported
… experimentally in dask
Array✅
❌
❌
❌
- Parameters:
- adata
AnnData The annotated data matrix.
- key
str|Sequence[str] The key(s) of the column(s) in
adata.obsthat differentiate(s) among experiments/batches. When multiple keys are provided, a combined batch variable is created from all columns.- basis
str(default:'X_pca') The name of the field in
adata.obsmwhere the PCA table is stored.- adjusted_basis
str(default:'X_pca_harmony') The name of the field in
adata.obsmwhere the adjusted PCA table will be stored.- dtype
numpy.typing.DTypeLike(default:<class 'numpy.float64'>) The data type to use for Harmony computation. If you use 32-bit you may experience numerical instability.
- flavor
Literal['harmony2','harmony1'] (default:'harmony2') Which version of the Harmony algorithm to use.
"harmony2"(default) enables the stabilized diversity penalty, dynamic per-cluster-per-batch ridge regularization, and automatic batch pruning from [Patikas et al., 2026]."harmony1"uses the original algorithm from [Korsunsky et al., 2019].- n_clusters
int|None(default:None) Number of clusters used for soft k-means in the Harmony algorithm. If
None, usesmin(100, N / 30). More clusters capture finer-grained structure but increase computation time.- max_iter_harmony
int(default:10) Maximum number of outer Harmony iterations (each consisting of a clustering step followed by a correction step).
- max_iter_clustering
int(default:200) Maximum iterations for the clustering step within each Harmony iteration.
- tol_harmony
float(default:0.0001) Convergence tolerance for the Harmony objective function. The algorithm stops when the relative change in objective falls below this value.
- tol_clustering
float(default:1e-05) Convergence tolerance for the clustering step within each Harmony iteration.
- sigma
float(default:0.1) Width of the soft-clustering kernel. Controls the entropy of cluster assignments: smaller values produce harder assignments (cells assigned to fewer clusters), while larger values produce softer assignments (cells spread across more clusters).
- theta
float|Sequence[float] (default:2.0) Diversity penalty weight per batch variable. Controls how strongly Harmony encourages each cluster to contain a balanced representation of all batches. Higher values (e.g.
4) produce more aggressive mixing; lower values (e.g.0.5) allow more batch-specific clusters. Set to0to disable batch correction entirely. A list can be provided to set different weights per batch variable.- tau
int(default:0) Discounting factor on
theta. Whentau > 0, the diversity penalty is down-weighted for batches with fewer cells, preventing over-correction of small batches. By default (0), there is no discounting.- ridge_lambda
float(default:1.0) Ridge regression regularization for the correction step. Larger values produce more conservative (smaller) corrections, preventing over-fitting. Only used with
flavor="harmony1".- alpha
float(default:0.2) Scaling factor for the dynamic per-cluster-per-batch ridge regularization. The effective regularization for each cluster-batch pair is
alpha * E_kbwhereE_kbis the expected number of cells. Larger values produce more conservative corrections. Only used withflavor="harmony2".- batch_prune_threshold
float|None(default:1e-05) Fraction threshold below which a batch-cluster pair is pruned (correction suppressed). When the fraction of a batch’s cells assigned to a cluster (
O_kb / N_b) falls below this threshold, that batch-cluster pair receives no correction, preventing spurious adjustments. Only used withflavor="harmony2". Set toNoneto disable pruning.- correction_method
Literal['fast','original'] (default:'fast') Method for the correction step.
"original"uses per-cluster ridge regression with explicit matrix inversion."fast"uses a precomputed factorization that avoids the full inversion, which can be faster for datasets with many batches.- block_proportion
float(default:0.05) Proportion of cells updated per clustering sub-iteration. Smaller values produce more stochastic updates. Larger values are faster but may converge to different solutions.
- rng
int|integer|Sequence[int] |SeedSequence|Generator|BitGenerator|None(default:None) Random number generator or seed for deterministic behavior.
- adata
- Return type:
- Returns:
Updates adata with the field
adata.obsm[adjusted_basis], containing principal components adjusted by Harmony such that different experiments are integrated.