scanpy.tl.tsne

scanpy.tl.tsne#

scanpy.tl.tsne(adata, n_pcs=None, *, n_components=2, use_rep=None, perplexity=30, metric='euclidean', early_exaggeration=12, learning_rate=1000, rng=None, use_fast_tsne=False, n_jobs=None, key_added=None (sc.settings.preset='scanpy-v1' – changes in 2.0), copy=False)[source]#

t-SNE [Amir et al., 2013, Pedregosa et al., 2011, van der Maaten and Hinton, 2008].

t-distributed stochastic neighborhood embedding (tSNE, van der Maaten and Hinton [2008]) was proposed for visualizating single-cell data by Amir et al. [2013]. Here, by default, we use the implementation of scikit-learn [Pedregosa et al., 2011]. You can achieve a huge speedup and better convergence if you install Multicore-tSNE by Ulyanov [2016], which will be automatically detected by Scanpy.

Array type support#
Array type	supported	… experimentally in dask `Array`
`numpy.ndarray`	✅	❌
`scipy.sparse.{csr,csc}_{array,matrix}`	✅	❌

Parameters:

adata AnnData

Annotated data matrix.

n_pcs int | None (default: None)

Use this many PCs. If n_pcs==0 use .X if use_rep is None.

use_rep str | None (default: None)

Use the indicated representation. 'X' or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < N_PCS (default: 50), .X is used, otherwise ‘X_pca’ is used. If ‘X_pca’ is not present, it’s computed with default parameters or n_pcs if present.

n_components int (default: 2)

The number of dimensions of the embedding.

perplexity float (default: 30)

The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. The choice is not extremely critical since t-SNE is quite insensitive to this parameter.

metric str (default: 'euclidean')

Distance metric calculate neighbors on.

early_exaggeration float (default: 12)

Controls how tight natural clusters in the original space are in the embedded space and how much space will be between them. For larger values, the space between natural clusters will be larger in the embedded space. Again, the choice of this parameter is not very critical. If the cost function increases during initial optimization, the early exaggeration factor or the learning rate might be too high.

learning_rate float (default: 1000)

Note that the R-package “Rtsne” uses a default of 200. The learning rate can be a critical parameter. It should be between 100 and 1000. If the cost function increases during initial optimization, the early exaggeration factor or the learning rate might be too high. If the cost function gets stuck in a bad local minimum increasing the learning rate helps sometimes.

Random number generation to control stochasticity.

If a type:SeedLike value, it’s used to seed a new random number generator; If a numpy.random.Generator, rng’s state will be directly advanced; If None, a non-reproducible random number generator is used. See numpy.random.default_rng() for more details.

The default value matches legacy scanpy behavior and will change to None in scanpy 2.0.

n_jobs int | None (default: None)

Number of jobs for parallel computation. None means using scanpy.settings.n_jobs.

key_added str | None | Default (default: None (sc.settings.preset='scanpy-v1' – changes in 2.0))

If not specified, the embedding is stored as obsm['X_tsne'] and the the parameters in uns['tsne']. If specified, the embedding is stored as obsm[key_added] and the the parameters in uns[key_added].

copy bool (default: False)

Return a copy instead of writing to adata.

Return type:

AnnData | None

Returns:

Returns None if copy=False, else returns an AnnData object. Sets the following fields:

adata.obsm['X_tsne' | key_added]numpy.ndarray (dtype float): tSNE coordinates of data.
adata.uns['tsne' | key_added]dict: tSNE parameters.

scanpy.tl.tsne

Contents

scanpy.tl.tsne#