scanpy.external.tl.harmony_timeseries
- scanpy.external.tl.harmony_timeseries(adata, tp, n_neighbors=30, n_components=1000, n_jobs=-2, copy=False)
Harmony time series for data visualization with augmented affinity matrix at discrete time points [Nowotschin18i].
Harmony time series is a framework for data visualization, trajectory detection and interpretation for scRNA-seq data measured at discrete time points. Harmony constructs an augmented affinity matrix by augmenting the kNN graph affinity matrix with mutually nearest neighbors between successive time points. This augmented affinity matrix forms the basis for generated a force directed layout for visualization and also serves as input for computing the diffusion operator which can be used for trajectory detection using Palantir.
Note
More information and bug reports here.
- Parameters:
- adata :
AnnData
Annotated data matrix of shape n_obs
×
n_vars. Rows correspond to cells and columns to genes. Rows represent two or more time points, where replicates of the same time point are consecutive in order.- tp :
str
key name of observation annotation
.obs
representing time points. Time points should be categorical ofdtype=category
. The unique categories for the categorical will be used as the time points to construct the timepoint connections.- n_neighbors :
int
(default:30
) Number of nearest neighbors for graph construction.
- n_components :
Optional
[int
] (default:1000
) Minimum number of principal components to use. Specify
None
to use pre-computed components. The higher the value the better to capture 85% of the variance.- n_jobs :
int
(default:-2
) Nearest Neighbors will be computed in parallel using n_jobs.
- copy :
bool
(default:False
) Return a copy instead of writing to
adata
.
- adata :
- Return type:
- Returns:
: Depending on
copy
, returns or updates.obsm
,.obsp
and.uns
with the following:- X_harmony -
ndarray
(obsm
, dtypefloat
) force directed layout
- harmony_aff -
spmatrix
(obsp
, dtypefloat
) affinity matrix
- harmony_aff_aug -
spmatrix
(obsp
, dtypefloat
) augmented affinity matrix
- harmony_timepoint_var -
str
(uns
) The name of the variable passed as
tp
- harmony_timepoint_connections -
ndarray
(uns
, dtypestr
) The links between time points
- X_harmony -
Example
>>> from itertools import product >>> import pandas as pd >>> from anndata import AnnData >>> import scanpy as sc >>> import scanpy.external as sce
Load
AnnData
A sample with real data is available here.
Random data sets of three time points with two replicates each:
>>> adata_ref = sc.datasets.pbmc3k() >>> start = [596, 615, 1682, 1663, 1409, 1432] >>> adata = AnnData.concatenate( ... *(adata_ref[i : i + 1000] for i in start), ... join="outer", ... batch_key="sample", ... batch_categories=[f"sa{i}_Rep{j}" for i, j in product((1, 2, 3), (1, 2))], ... ) >>> time_points = adata.obs["sample"].str.split("_", expand=True)[0] >>> adata.obs["time_points"] = pd.Categorical( .... time_points, categories=['sa1', 'sa2', 'sa3'] ... )
Normalize and filter for highly expressed genes
>>> sc.pp.normalize_total(adata, target_sum=10000) >>> sc.pp.log1p(adata) >>> sc.pp.highly_variable_genes(adata, n_top_genes=1000, subset=True)
Run harmony_timeseries
>>> sce.tl.harmony_timeseries(adata, tp="time_points", n_components=500)
Plot time points:
>>> sce.pl.harmony_timeseries(adata)
For further demonstration of Harmony visualizations please follow the notebook Harmony_sample_notebook.ipynb. It provides a comprehensive guide to draw gene expression trends, amongst other things.