scanpy.external.tl.phenograph
- scanpy.external.tl.phenograph(adata, clustering_algo='louvain', k=30, directed=False, prune=False, min_cluster_size=10, jaccard=True, primary_metric='euclidean', n_jobs=- 1, q_tol=0.001, louvain_time_limit=2000, nn_method='kdtree', partition_type=None, resolution_parameter=1, n_iterations=- 1, use_weights=True, seed=None, copy=False, **kargs)
PhenoGraph clustering [Levine15].
PhenoGraph is a clustering method designed for high-dimensional single-cell data. It works by creating a graph (“network”) representing phenotypic similarities between cells and then identifying communities in this graph. It supports both Louvain and Leiden algorithms for community detection.
Note
More information and bug reports here.
- Parameters
- adata :
AnnData
|ndarray
|spmatrix
Union
[AnnData
,ndarray
,spmatrix
] AnnData, or Array of data to cluster, or sparse matrix of k-nearest neighbor graph. If ndarray, n-by-d array of n cells in d dimensions. if sparse matrix, n-by-n adjacency matrix.
- clustering_algo : {‘louvain’, ‘leiden’} |
None
Optional
[Literal
[‘louvain’, ‘leiden’]] (default:'louvain'
) Choose between
'Louvain'
or'Leiden'
algorithm for clustering.- k :
int
int
(default:30
) Number of nearest neighbors to use in first step of graph construction.
- directed :
bool
bool
(default:False
) Whether to use a symmetric (default) or asymmetric (
'directed'
) graph. The graph construction process produces a directed graph, which is symmetrized by one of two methods (seeprune
below).- prune :
bool
bool
(default:False
) prune=False
, symmetrize by taking the average between the graph and its transpose.prune=True
, symmetrize by taking the product between the graph and its transpose.- min_cluster_size :
int
int
(default:10
) Cells that end up in a cluster smaller than min_cluster_size are considered outliers and are assigned to -1 in the cluster labels.
- jaccard :
bool
bool
(default:True
) If
True
, use Jaccard metric between k-neighborhoods to build graph. IfFalse
, use a Gaussian kernel.- primary_metric : {‘euclidean’, ‘manhattan’, ‘correlation’, ‘cosine’}
Literal
[‘euclidean’, ‘manhattan’, ‘correlation’, ‘cosine’] (default:'euclidean'
) Distance metric to define nearest neighbors. Note that performance will be slower for correlation and cosine.
- n_jobs :
int
int
(default:-1
) Nearest Neighbors and Jaccard coefficients will be computed in parallel using n_jobs. If 1 is given, no parallelism is used. If set to -1, all CPUs are used. For n_jobs below -1,
n_cpus + 1 + n_jobs
are used.- q_tol :
float
float
(default:0.001
) Tolerance, i.e. precision, for monitoring modularity optimization.
- louvain_time_limit :
int
int
(default:2000
) Maximum number of seconds to run modularity optimization. If exceeded the best result so far is returned.
- nn_method : {‘kdtree’, ‘brute’}
Literal
[‘kdtree’, ‘brute’] (default:'kdtree'
) Whether to use brute force or kdtree for nearest neighbor search. For very large high-dimensional data sets, brute force, with parallel computation, performs faster than kdtree.
- partition_type :
Type
[MutableVertexPartition
] |None
Optional
[Type
[MutableVertexPartition
]] (default:None
) Defaults to
RBConfigurationVertexPartition
. For the available options, consult the documentation forfind_partition()
.- resolution_parameter :
float
float
(default:1
) A parameter value controlling the coarseness of the clustering in Leiden. Higher values lead to more clusters. Set to
None
if overridingpartition_type
to one that does not accept aresolution_parameter
.- n_iterations :
int
int
(default:-1
) Number of iterations to run the Leiden algorithm. If the number of iterations is negative, the Leiden algorithm is run until an iteration in which there was no improvement.
- use_weights :
bool
bool
(default:True
) Use vertices in the Leiden computation.
- seed :
int
|None
Optional
[int
] (default:None
) Leiden initialization of the optimization.
- copy :
bool
bool
(default:False
) Return a copy or write to
adata
.- kargs :
Any
Any
Additional arguments passed to
find_partition()
and the constructor of thepartition_type
.
- adata :
- Return type
Tuple
[Optional
[ndarray
],spmatrix
,Optional
[float
]]Tuple
[Optional
[ndarray
],spmatrix
,Optional
[float
]]- Returns
Depending on
copy
, returns or updatesadata
with the following fields:
Example
>>> from anndata import AnnData >>> import scanpy as sc >>> import scanpy.external as sce >>> import numpy as np >>> import pandas as pd
With annotated data as input:
>>> adata = sc.datasets.pbmc3k() >>> sc.pp.normalize_per_cell(adata)
Then do PCA:
>>> sc.tl.pca(adata, n_comps=100)
Compute phenograph clusters:
Louvain community detection
>>> sce.tl.phenograph(adata, clustering_algo="louvain", k=30)
Leiden community detection
>>> sce.tl.phenograph(adata, clustering_algo="leiden", k=30)
Return only
Graph
object>>> sce.tl.phenograph(adata, clustering_algo=None, k=30)
Now to show phenograph on tSNE (for example):
Compute tSNE:
>>> sc.tl.tsne(adata, random_state=7)
Plot phenograph clusters on tSNE:
>>> sc.pl.tsne( ... adata, color = ["pheno_louvain", "pheno_leiden"], s = 100, ... palette = sc.pl.palettes.vega_20_scanpy, legend_fontsize = 10 ... )
Cluster and cluster centroids for input Numpy ndarray
>>> df = np.random.rand(1000, 40) >>> dframe = pd.DataFrame(df) >>> dframe.index, dframe.columns = (map(str, dframe.index), map(str, dframe.columns)) >>> adata = AnnData(dframe) >>> sc.tl.pca(adata, n_comps=20) >>> sce.tl.phenograph(adata, clustering_algo="leiden", k=50) >>> sc.tl.tsne(adata, random_state=1) >>> sc.pl.tsne( ... adata, color=['pheno_leiden'], s=100, ... palette=sc.pl.palettes.vega_20_scanpy, legend_fontsize=10 ... )