scanpy.external.tl.palantir

scanpy.external.tl.palantir(adata, n_components=10, knn=30, alpha=0, use_adjacency_matrix=False, distances_key=None, n_eigs=None, impute_data=True, n_steps=3, copy=False)

Run Diffusion maps using the adaptive anisotropic kernel [Setty18].

Palantir is an algorithm to align cells along differentiation trajectories. Palantir models differentiation as a stochastic process where stem cells differentiate to terminally differentiated cells by a series of steps through a low dimensional phenotypic manifold. Palantir effectively captures the continuity in cell states and the stochasticity in cell fate determination. Palantir has been designed to work with multidimensional single cell data from diverse technologies such as Mass cytometry and single cell RNA-seq.

Note

More information and bug reports here.

Parameters:

adata : AnnData: An AnnData object.
n_components : int (default: 10): Number of diffusion components.
knn : int (default: 30): Number of nearest neighbors for graph construction.
alpha : float (default: 0): Normalization parameter for the diffusion operator.
use_adjacency_matrix : bool (default: False): Use adaptive anisotropic adjacency matrix, instead of PCA projections (default) to compute diffusion components.
distances_key : Optional[str] (default: None): With use_adjacency_matrix=True, use the indicated distances key for .obsp. If None, 'distances'.
n_eigs : Optional[int] (default: None): Number of eigen vectors to use. If None specified, the number of eigen vectors will be determined using eigen gap. Passed to palantir.utils.determine_multiscale_space.
impute_data : bool (default: True): Impute data using MAGIC.
n_steps : int (default: 3): Number of steps in the diffusion operator. Passed to palantir.utils.run_magic_imputation.
copy : bool (default: False): Return a copy instead of writing to adata.

Return type:

Optional[AnnData]

Returns:

: Depending on copy, returns or updates adata with the following fields:

Diffusion maps,

used for magic imputation, and to generate multi-scale data matrix,

X_palantir_diff_comp - ndarray (obsm, dtype float)
Array of Diffusion components.
palantir_EigenValues - ndarray (uns, dtype float)
Array of corresponding eigen values.
palantir_diff_op - spmatrix (obsp, dtype float)
The diffusion operator matrix.

Multi scale space results,

used to build tsne on diffusion components, and to compute branch probabilities and waypoints,

X_palantir_multiscale - ndarray (obsm, dtype float)
Multi scale data matrix.

MAGIC imputation,

used for plotting gene expression on tsne, and gene expression trends,

palantir_imp - ndarray (layers, dtype float)
Imputed data matrix (MAGIC imputation).

Example

>>> import scanpy.external as sce
>>> import scanpy as sc

A sample data is available here.

Load sample data

>>> adata = sc.read_csv(filename="Palantir/data/marrow_sample_scseq_counts.csv.gz")

Cleanup and normalize

>>> sc.pp.filter_cells(adata, min_counts=1000)
>>> sc.pp.filter_genes(adata, min_counts=10)
>>> sc.pp.normalize_per_cell(adata)
>>> sc.pp.log1p(adata)

Data preprocessing

Palantir builds diffusion maps using one of two optional inputs:

Principal component analysis

>>> sc.tl.pca(adata, n_comps=300)

or,

Nearist neighbors graph

>>> sc.pp.neighbors(adata, knn=30)

Diffusion maps

Palantir determines the diffusion maps of the data as an estimate of the low dimensional phenotypic manifold of the data.

>>> sce.tl.palantir(adata, n_components=5, knn=30)

if pre-computed distances are to be used,

>>> sce.tl.palantir(
...     adata,
...     n_components=5,
...     knn=30,
...     use_adjacency_matrix=True,
...     distances_key="distances",
... )

Visualizing Palantir results

tSNE visualization

important for Palantir!

Palantir constructs the tSNE map in the embedded space since these maps better represent the differentiation trajectories.

>>> sc.tl.tsne(adata, n_pcs=2, use_rep='X_palantir_multiscale', perplexity=150)

tsne by cell size

>>> sc.pl.tsne(adata, color="n_counts")

Imputed gene expression visualized on tSNE maps

>>> sc.pl.tsne(
...     adata,
...     gene_symbols=['CD34', 'MPO', 'GATA1', 'IRF8'],
...     layer='palantir_imp',
...     color=['CD34', 'MPO', 'GATA1', 'IRF8']
... )

Running Palantir

Palantir can be run by specifying an approximate early cell. While Palantir automatically determines the terminal states, they can also be specified using the termine_states parameter.

>>> start_cell = 'Run5_164698952452459'
>>> pr_res = sce.tl.palantir_results(
...     adata,
...     early_cell=start_cell,
...     ms_data='X_palantir_multiscale',
...     num_waypoints=500,
... )

Note

A start_cell must be defined for every data set. The start cell for this dataset was chosen based on high expression of CD34.

At this point the returned Palantir object pr_res can be used for all downstream analysis and plotting. Please consult this notebook Palantir_sample_notebook.ipynb. It provides a comprehensive guide to draw gene expression trends, amongst other things.