scanpy.external.tl.harmony_timeseries

scanpy.external.tl.harmony_timeseries(adata, tp, n_neighbors=30, n_components=1000, n_jobs=- 2)

Harmony time series for data visualization with augmented affinity matrix at discrete time points [Nowotschin18i].

Harmony time series is a framework for data visualization, trajectory detection and interpretation for scRNA-seq data measured at discrete time points. Harmony constructs an augmented affinity matrix by augmenting the kNN graph affinity matrix with mutually nearest neighbors between successive time points. This augmented affinity matrix forms the basis for generated a force directed layout for visualization and also serves as input for computing the diffusion operator which can be used for trajectory detection using Palantir.

Note

More information and bug reports here.

Parameters
adata : AnnDataAnnData

Annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes. Rows represent two or more time points, where replicates of the same time point are consecutive in order.

tp : strstr

key name of observation annotation .obs representing time points.

n_neighbors : intint (default: 30)

Number of nearest neighbors for graph construction.

n_components : int, NoneOptional[int] (default: 1000)

Minimum number of principal components to use. Specify None to use pre-computed components.

n_jobs : intint (default: -2)

Nearest Neighbors will be computed in parallel using n_jobs.

Returns

Updates .obsm, .obsp and .uns with the following:

.obsm['X_harmony']

force directed layout

.obsp['harmony_aff']

affinity matrix

.obsp['harmony_aff_aug']

augmented affinity matrix

.uns['harmony_timepoint_var']

The name of the variable passed as tp

.uns['harmony_timepoint_connections']

The links between time points

Example

>>> from itertools import product
>>> from anndata import AnnData
>>> import scanpy as sc
>>> import scanpy.external as sce

Load AnnData

A sample with real data is available here.

Random data sets of three time points with two replicates each:

>>> adata_ref = sc.datasets.pbmc3k()
>>> start = [596, 615, 1682, 1663, 1409, 1432]
>>> adata = AnnData.concatenate(
...     *(adata_ref[i : i + 1000] for i in start),
...     join="outer",
...     batch_key="sample",
...     batch_categories=[f"sa{i}_Rep{j}" for i, j in product((1, 2, 3), (1, 2))],
... )
>>> adata.obs["time_points"] = adata.obs["sample"].str.split("_", expand=True)[0]

Normalize and filter for highly expressed genes

>>> sc.pp.normalize_total(adata, target_sum=10000)
>>> sc.pp.log1p(adata)
>>> sc.pp.highly_variable_genes(adata, n_top_genes=1000, subset=True)

Run harmony_timeseries

>>> sce.tl.harmony_timeseries(adata, tp="time_points", n_components=None)

Plot time points:

>>> sce.pl.harmony_timeseries(adata)

For further demonstration of Harmony visualizations please follow the notebook Harmony_sample_notebook.ipynb. It provides a comprehensive guide to draw gene expression trends, amongst other things.