scanpy.external.tl.palantir
- scanpy.external.tl.palantir(adata, n_components=10, knn=30, alpha=0, use_adjacency_matrix=False, distances_key=None, n_eigs=None, impute_data=True, n_steps=3, copy=False)
Run Diffusion maps using the adaptive anisotropic kernel [Setty18].
Palantir is an algorithm to align cells along differentiation trajectories. Palantir models differentiation as a stochastic process where stem cells differentiate to terminally differentiated cells by a series of steps through a low dimensional phenotypic manifold. Palantir effectively captures the continuity in cell states and the stochasticity in cell fate determination. Palantir has been designed to work with multidimensional single cell data from diverse technologies such as Mass cytometry and single cell RNA-seq.
Note
More information and bug reports here.
- Parameters:
- adata :
AnnData
An AnnData object.
- n_components :
int
(default:10
) Number of diffusion components.
- knn :
int
(default:30
) Number of nearest neighbors for graph construction.
- alpha :
float
(default:0
) Normalization parameter for the diffusion operator.
- use_adjacency_matrix :
bool
(default:False
) Use adaptive anisotropic adjacency matrix, instead of PCA projections (default) to compute diffusion components.
- distances_key :
Optional
[str
] (default:None
) With
use_adjacency_matrix=True
, use the indicated distances key for.obsp
. IfNone
,'distances'
.- n_eigs :
Optional
[int
] (default:None
) Number of eigen vectors to use. If
None
specified, the number of eigen vectors will be determined using eigen gap. Passed topalantir.utils.determine_multiscale_space
.- impute_data :
bool
(default:True
) Impute data using MAGIC.
- n_steps :
int
(default:3
) Number of steps in the diffusion operator. Passed to
palantir.utils.run_magic_imputation
.- copy :
bool
(default:False
) Return a copy instead of writing to
adata
.
- adata :
- Return type:
- Returns:
: Depending on
copy
, returns or updatesadata
with the following fields:- Diffusion maps,
used for magic imputation, and to generate multi-scale data matrix,
- Multi scale space results,
used to build tsne on diffusion components, and to compute branch probabilities and waypoints,
- MAGIC imputation,
used for plotting gene expression on tsne, and gene expression trends,
Example
>>> import scanpy.external as sce >>> import scanpy as sc
A sample data is available here.
Load sample data
>>> adata = sc.read_csv(filename="Palantir/data/marrow_sample_scseq_counts.csv.gz")
Cleanup and normalize
>>> sc.pp.filter_cells(adata, min_counts=1000) >>> sc.pp.filter_genes(adata, min_counts=10) >>> sc.pp.normalize_per_cell(adata) >>> sc.pp.log1p(adata)
Data preprocessing
Palantir builds diffusion maps using one of two optional inputs:
Principal component analysis
>>> sc.tl.pca(adata, n_comps=300)
or,
Nearist neighbors graph
>>> sc.pp.neighbors(adata, knn=30)
Diffusion maps
Palantir determines the diffusion maps of the data as an estimate of the low dimensional phenotypic manifold of the data.
>>> sce.tl.palantir(adata, n_components=5, knn=30)
if pre-computed distances are to be used,
>>> sce.tl.palantir( ... adata, ... n_components=5, ... knn=30, ... use_adjacency_matrix=True, ... distances_key="distances", ... )
Visualizing Palantir results
tSNE visualization
important for Palantir!
Palantir constructs the tSNE map in the embedded space since these maps better represent the differentiation trajectories.
>>> sc.tl.tsne(adata, n_pcs=2, use_rep='X_palantir_multiscale', perplexity=150)
tsne by cell size
>>> sc.pl.tsne(adata, color="n_counts")
Imputed gene expression visualized on tSNE maps
>>> sc.pl.tsne( ... adata, ... gene_symbols=['CD34', 'MPO', 'GATA1', 'IRF8'], ... layer='palantir_imp', ... color=['CD34', 'MPO', 'GATA1', 'IRF8'] ... )
Running Palantir
Palantir can be run by specifying an approximate early cell. While Palantir automatically determines the terminal states, they can also be specified using the
termine_states
parameter.>>> start_cell = 'Run5_164698952452459' >>> pr_res = sce.tl.palantir_results( ... adata, ... early_cell=start_cell, ... ms_data='X_palantir_multiscale', ... num_waypoints=500, ... )
Note
A
start_cell
must be defined for every data set. The start cell for this dataset was chosen based on high expression of CD34.At this point the returned Palantir object
pr_res
can be used for all downstream analysis and plotting. Please consult this notebook Palantir_sample_notebook.ipynb. It provides a comprehensive guide to draw gene expression trends, amongst other things.