scanpy.external.tl.phenograph
- scanpy.external.tl.phenograph(adata, clustering_algo='louvain', k=30, directed=False, prune=False, min_cluster_size=10, jaccard=True, primary_metric='euclidean', n_jobs=-1, q_tol=0.001, louvain_time_limit=2000, nn_method='kdtree', partition_type=None, resolution_parameter=1, n_iterations=-1, use_weights=True, seed=None, copy=False, **kargs)
PhenoGraph clustering [Levine15].
PhenoGraph is a clustering method designed for high-dimensional single-cell data. It works by creating a graph (“network”) representing phenotypic similarities between cells and then identifying communities in this graph. It supports both Louvain and Leiden algorithms for community detection.
Note
More information and bug reports here.
- Parameters:
- adata :
Union
[AnnData
,ndarray
,spmatrix
] AnnData, or Array of data to cluster, or sparse matrix of k-nearest neighbor graph. If ndarray, n-by-d array of n cells in d dimensions. if sparse matrix, n-by-n adjacency matrix.
- clustering_algo :
Optional
[Literal
['louvain'
,'leiden'
]] (default:'louvain'
) Choose between
'Louvain'
or'Leiden'
algorithm for clustering.- k :
int
(default:30
) Number of nearest neighbors to use in first step of graph construction.
- directed :
bool
(default:False
) Whether to use a symmetric (default) or asymmetric (
'directed'
) graph. The graph construction process produces a directed graph, which is symmetrized by one of two methods (seeprune
below).- prune :
bool
(default:False
) prune=False
, symmetrize by taking the average between the graph and its transpose.prune=True
, symmetrize by taking the product between the graph and its transpose.- min_cluster_size :
int
(default:10
) Cells that end up in a cluster smaller than min_cluster_size are considered outliers and are assigned to -1 in the cluster labels.
- jaccard :
bool
(default:True
) If
True
, use Jaccard metric between k-neighborhoods to build graph. IfFalse
, use a Gaussian kernel.- primary_metric :
Literal
['euclidean'
,'manhattan'
,'correlation'
,'cosine'
] (default:'euclidean'
) Distance metric to define nearest neighbors. Note that performance will be slower for correlation and cosine.
- n_jobs :
int
(default:-1
) Nearest Neighbors and Jaccard coefficients will be computed in parallel using n_jobs. If 1 is given, no parallelism is used. If set to -1, all CPUs are used. For n_jobs below -1,
n_cpus + 1 + n_jobs
are used.- q_tol :
float
(default:0.001
) Tolerance, i.e. precision, for monitoring modularity optimization.
- louvain_time_limit :
int
(default:2000
) Maximum number of seconds to run modularity optimization. If exceeded the best result so far is returned.
- nn_method :
Literal
['kdtree'
,'brute'
] (default:'kdtree'
) Whether to use brute force or kdtree for nearest neighbor search. For very large high-dimensional data sets, brute force, with parallel computation, performs faster than kdtree.
- partition_type :
Optional
[Type
[MutableVertexPartition
]] (default:None
) Defaults to
RBConfigurationVertexPartition
. For the available options, consult the documentation forfind_partition()
.- resolution_parameter :
float
(default:1
) A parameter value controlling the coarseness of the clustering in Leiden. Higher values lead to more clusters. Set to
None
if overridingpartition_type
to one that does not accept aresolution_parameter
.- n_iterations :
int
(default:-1
) Number of iterations to run the Leiden algorithm. If the number of iterations is negative, the Leiden algorithm is run until an iteration in which there was no improvement.
- use_weights :
bool
(default:True
) Use vertices in the Leiden computation.
- seed :
Optional
[int
] (default:None
) Leiden initialization of the optimization.
- copy :
bool
(default:False
) Return a copy or write to
adata
.- kargs :
Any
Additional arguments passed to
find_partition()
and the constructor of thepartition_type
.
- adata :
- Return type:
- Returns:
: Depending on
copy
, returns or updatesadata
with the following fields:
Example
>>> from anndata import AnnData >>> import scanpy as sc >>> import scanpy.external as sce >>> import numpy as np >>> import pandas as pd
With annotated data as input:
>>> adata = sc.datasets.pbmc3k() >>> sc.pp.normalize_per_cell(adata)
Then do PCA:
>>> sc.tl.pca(adata, n_comps=100)
Compute phenograph clusters:
Louvain community detection
>>> sce.tl.phenograph(adata, clustering_algo="louvain", k=30)
Leiden community detection
>>> sce.tl.phenograph(adata, clustering_algo="leiden", k=30)
Return only
Graph
object>>> sce.tl.phenograph(adata, clustering_algo=None, k=30)
Now to show phenograph on tSNE (for example):
Compute tSNE:
>>> sc.tl.tsne(adata, random_state=7)
Plot phenograph clusters on tSNE:
>>> sc.pl.tsne( ... adata, color = ["pheno_louvain", "pheno_leiden"], s = 100, ... palette = sc.pl.palettes.vega_20_scanpy, legend_fontsize = 10 ... )
Cluster and cluster centroids for input Numpy ndarray
>>> df = np.random.rand(1000, 40) >>> dframe = pd.DataFrame(df) >>> dframe.index, dframe.columns = (map(str, dframe.index), map(str, dframe.columns)) >>> adata = AnnData(dframe) >>> sc.tl.pca(adata, n_comps=20) >>> sce.tl.phenograph(adata, clustering_algo="leiden", k=50) >>> sc.tl.tsne(adata, random_state=1) >>> sc.pl.tsne( ... adata, color=['pheno_leiden'], s=100, ... palette=sc.pl.palettes.vega_20_scanpy, legend_fontsize=10 ... )