scanpy.pp.neighbors

scanpy.pp.neighbors#

scanpy.pp.neighbors(adata, n_neighbors=15, n_pcs=None, *, distances=None, use_rep=None, knn=True, method='umap', transformer=None, metric=None, metric_kwds=mappingproxy({}), random_state=0, key_added=None, copy=False)[source]#

Compute the nearest neighbors distance matrix and a neighborhood graph of observations [McInnes et al., 2018].

The neighbor search efficiency of this heavily relies on UMAP [McInnes et al., 2018], which also provides a method for estimating connectivities of data points - the connectivity of the manifold (method=='umap'). If method=='gauss', connectivities are computed according to Coifman et al. [2005], in the adaption of Haghverdi et al. [2016]. If method=='jaccard', connectivities are computed as in PhenoGraph [Levine et al., 2015].

Array type support#
Array type	supported	… experimentally in dask `Array`
`numpy.ndarray`	✅	❌
`scipy.sparse.{csr,csc}_{array,matrix}`	✅	❌

Parameters:

adata AnnData

Annotated data matrix.

n_neighbors int (default: 15)

The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100. If knn is True, number of nearest neighbors to be searched. If knn is False, a Gaussian kernel width is set to the distance of the n_neighbors neighbor.

ignored if ``transformer`` is an instance.

n_pcs int | None (default: None)

Use this many PCs. If n_pcs==0 use .X if use_rep is None.

use_rep str | None (default: None)

Use the indicated representation. 'X' or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < N_PCS (default: 50), .X is used, otherwise ‘X_pca’ is used. If ‘X_pca’ is not present, it’s computed with default parameters or n_pcs if present.

knn bool (default: True)

If True, use a hard threshold to restrict the number of neighbors to n_neighbors, that is, consider a knn graph. Otherwise, use a Gaussian Kernel to assign low weights to neighbors more distant than the n_neighbors nearest neighbor.

method Literal['umap', 'gauss', 'jaccard'] (default: 'umap')

Use ‘umap’ [McInnes et al., 2018], ‘gauss’ (Gauss kernel following Coifman et al. [2005] with adaptive width Haghverdi et al. [2016]), or ‘jaccard’ (Jaccard kernel as in PhenoGraph, Levine et al. [2015]) for computing connectivities.

transformer KnnTransformerLike | Literal['pynndescent', 'sklearn', 'rapids'] | None (default: None)

Approximate kNN search implementation following the API of KNeighborsTransformer. See Using other kNN libraries in Scanpy for more details. Also accepts the following known options:

None (the default): Behavior depends on data size. For small data, we will calculate exact kNN, otherwise we use PyNNDescentTransformer
'pynndescent': PyNNDescentTransformer
'rapids': A transformer based on cuml.neighbors.NearestNeighbors.

Deprecated since version 1.10.0: Use rapids_singlecell.pp.neighbors() instead.

metric Literal['cityblock', 'cosine', 'euclidean', 'l1', 'l2', 'manhattan'] | Literal['braycurtis', 'canberra', 'chebyshev', 'correlation', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'] | Callable[[ndarray, ndarray], float] | None (default: None)

A known metric’s name or a callable that returns a distance. If distances is given, this parameter is simply stored in .uns (see below), otherwise defaults to 'euclidean'.

ignored if ``transformer`` is an instance.

metric_kwds Mapping[str, Any] (default: mappingproxy({}))

Options for the metric.

ignored if ``transformer`` is an instance.

random_state int | RandomState | None (default: 0)

A numpy random seed.

ignored if ``transformer`` is an instance.

key_added str | None (default: None)

If not specified, the neighbors data is stored in .uns['neighbors'], distances and connectivities are stored in .obsp['distances'] and .obsp['connectivities'] respectively. If specified, the neighbors data is added to .uns[key_added], distances are stored in .obsp[f'{key_added}_distances'] and connectivities in .obsp[f'{key_added}_connectivities'].

copy bool (default: False)

Return a copy instead of writing to adata.

Return type:

AnnData | None

Returns:

Returns None if copy=False, else returns an AnnData object. Sets the following fields:

adata.obsp['distances' | f'{key_added}_distances']scipy.sparse.csr_matrix (dtype float): Distance matrix of the nearest neighbors search. Each row (cell) has n_neighbors-1 non-zero entries. These are the distances to their n_neighbors-1 nearest neighbors (excluding the cell itself).
adata.obsp['connectivities' | f'{key_added}_connectivities']scipy.sparse._csr.csr_matrix (dtype float): Weighted adjacency matrix of the neighborhood graph of data points. Weights should be interpreted as connectivities.
adata.uns['neighbors' | key_added]dict: neighbors parameters.

Examples

>>> import scanpy as sc
>>> adata = sc.datasets.pbmc68k_reduced()
>>> # Basic usage
>>> sc.pp.neighbors(adata, 20, metric="cosine")
>>> # Provide your own transformer for more control and flexibility
>>> from sklearn.neighbors import KNeighborsTransformer
>>> transformer = KNeighborsTransformer(
...     n_neighbors=10, metric="manhattan", algorithm="kd_tree"
... )
>>> sc.pp.neighbors(adata, transformer=transformer)
>>> # now you can e.g. access the index: `transformer._tree`

scanpy.pp.neighbors

Contents

scanpy.pp.neighbors#