scanpy.external.tl.sam¶

scanpy.external.tl.
sam
(adata, max_iter=10, num_norm_avg=50, k=20, distance='correlation', standardization='StandardScaler', weight_pcs=False, sparse_pca=False, n_pcs=150, n_genes=3000, projection='umap', inplace=True, verbose=True)¶ SelfAssembling Manifolds singlecell RNA sequencing analysis tool [Tarashansky19].
SAM iteratively rescales the input gene expression matrix to emphasize genes that are spatially variable along the intrinsic manifold of the data. It outputs the gene weights, nearest neighbor matrix, and a 2D projection.
The AnnData input should contain unstandardized, nonnegative values. Preferably, the data should be lognormalized and no genes should be filtered out.
 Parameters
 k :
int
int
(default:20
) The number of nearest neighbors to identify for each cell.
 distance :
str
str
(default:'correlation'
) The distance metric to use when identifying nearest neighbors. Can be any of the distance metrics supported by
pdist()
. max_iter :
int
int
(default:10
) The maximum number of iterations SAM will run.
 projection : {‘umap’, ‘tsne’, ‘None’}
Literal
[‘umap’, ‘tsne’, ‘None’] (default:'umap'
) If ‘tsne’, generates a tSNE embedding. If ‘umap’, generates a UMAP embedding. If ‘None’, no embedding will be generated.
 standardization : {‘Normalizer’, ‘StandardScaler’, ‘None’}
Literal
[‘Normalizer’, ‘StandardScaler’, ‘None’] (default:'StandardScaler'
) If ‘Normalizer’, use sklearn.preprocessing.Normalizer, which normalizes expression data prior to PCA such that each cell has unit L2 norm. If ‘StandardScaler’, use sklearn.preprocessing.StandardScaler, which normalizes expression data prior to PCA such that each gene has zero mean and unit variance. Otherwise, do not normalize the expression data. We recommend using ‘StandardScaler’ for large datasets with many expected cell types and ‘Normalizer’ otherwise. If ‘None’, no transformation is applied.
 num_norm_avg :
int
int
(default:50
) The top ‘num_norm_avg’ dispersions are averaged to determine the normalization factor when calculating the weights. This prevents genes with large spatial dispersions from skewing the distribution of weights.
 weight_pcs :
bool
bool
(default:False
) If True, scale the principal components by their eigenvalues. In datasets with many expected cell types, setting this to False might improve the resolution as these cell types might be encoded by lower variance principal components.
 sparse_pca :
bool
bool
(default:False
) If True, uses an implementation of PCA that accepts sparse inputs. This way, we no longer need a temporary dense copy of the sparse data. However, this implementation is slower and so is only worth using when memory constraints become noticeable.
 n_pcs :
int
None
Optional
[int
] (default:150
) Determines the number of top principal components selected at each iteration of the SAM algorithm. If None, this number is chosen automatically based on the size of the dataset. If weight_pcs is set to True, this parameter primarily affects the runtime of the SAM algorithm (more PCs = longer runtime).
 n_genes :
int
None
Optional
[int
] (default:3000
) Determines the number of top SAMweighted genes to use at each iteration of the SAM algorithm. If None, this number is chosen automatically based on the size of the dataset. This parameter primarily affects the runtime of the SAM algorithm (more genes = longer runtime). For extremely homogeneous datasets, decreasing
n_genes
may improve clustering resolution. inplace :
bool
bool
(default:True
) Set fields in
adata
if True. Otherwise, returns a copy. verbose :
bool
bool
(default:True
) If True, displays SAM log statements.
 k :
 Return type
 Returns
sam_obj if inplace is True or (sam_obj,AnnData) otherwise
 adata  AnnData
.var['weights']
SAM weights for each gene.
.var['spatial_dispersions']
Spatial dispersions for each gene (these are used to compute the SAM weights)
.uns['sam']
Dictionary of SAMspecific outputs, such as the parameters used for preprocessing (‘preprocess_args’) and running (‘run_args’) SAM.
.uns['neighbors']
A dictionary with key ‘connectivities’ containing the kNN adjacency matrix output by SAM. If builtin scanpy dimensionality reduction methods are to be used using the SAMoutput AnnData, users should recompute the neighbors using
.obs['X_pca']
withscanpy.pp.neighbors
..obsm['X_pca']
The principal components output by SAM.
.obsm['X_umap']
The UMAP projection output by SAM.
.layers['X_disp']
The expression matrix used for nearestneighbor averaging.
.layers['X_knn_avg']
The nearestneighboraveraged expression data used for computing the spatial dispersions of genes.
Example
>>> import scanpy.external as sce >>> import scanpy as sc
* Running SAM *
Assuming we are given an AnnData object called
adata
, we can run the SAM algorithm as follows:>>> sam_obj = sce.tl.sam(adata,inplace=True)
The input AnnData object should contain unstandardized, nonnegative expression values. Preferably, the data should be lognormalized and no genes should be filtered out.
Please see the documentation for a description of all available parameters.
For more detailed tutorials, please visit the original Github repository: https://github.com/atarashansky/selfassemblingmanifold/tree/master/tutorial
* Plotting *
To visualize the output, we can use:
>>> sce.pl.sam(adata,projection='X_umap')
sce.pl.sam
accepts all keyword arguments used in thematplotlib.pyplot.scatter
function.* SAMGUI *
SAM comes with the SAMGUI module, a graphicaluser interface written with
Plotly
andipythonwidgets
for interactively exploring and annotating the scRNAseq data and running SAM.Dependencies can be installed with Anaconda by following the instructions in the selfassemblingmanifold Github README: https://github.com/atarashansky/selfassemblingmanifold
In a Jupyter notebook, execute the following to launch the interface:
>>> from samalg.gui import SAMGUI >>> sam_gui = SAMGUI(sam_obj) # sam_obj is your SAM object >>> sam_gui.SamPlot
This can also be enabled in Jupyer Lab by following the instructions in the selfassemblingmanifold README.