scanpy.external.pp.scvi¶
-
scanpy.external.pp.
scvi
(adata, n_hidden=128, n_latent=10, n_layers=1, dispersion='gene', n_epochs=400, lr=0.001, train_size=1.0, batch_key=None, use_highly_variable_genes=True, subset_genes=None, linear_decoder=False, copy=False, use_cuda=True, return_posterior=True, trainer_kwargs={}, model_kwargs={})¶ SCVI [Lopez18].
Fits scVI model onto raw count data given an anndata object
scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity.
To use a linear-decoded Variational AutoEncoder model (implementation of [Svensson20].), set linear_decoded = True. Compared to standard VAE, this model is less powerful, but can be used to inspect which genes contribute to variation in the dataset. It may also be used for all scVI tasks, like differential expression, batch correction, imputation, etc. However, batch correction may be less powerful as it assumes a linear model.
Note
More information and bug reports here.
- Parameters
- adata :
AnnData
AnnData
An anndata file with
X
attribute of unnormalized count data- n_hidden :
int
int
(default:128
) Number of nodes per hidden layer
- n_latent :
int
int
(default:10
) Dimensionality of the latent space
- n_layers :
int
int
(default:1
) Number of hidden layers used for encoder and decoder NNs
- dispersion :
str
str
(default:'gene'
) One of the following *
'gene'
- dispersion parameter of NB is constant per gene across cells *'gene-batch'
- dispersion can differ between different batches *'gene-label'
- dispersion can differ between different labels *'gene-cell'
- dispersion can differ for every gene in every cell- n_epochs :
int
int
(default:400
) Number of epochs to train
- lr :
int
int
(default:0.001
) Learning rate
- train_size :
int
int
(default:1.0
) The train size, either a float between 0 and 1 or an integer for the number of training samples to use
- batch_key :
str
|None
Optional
[str
] (default:None
) Column name in anndata.obs for batches. If None, no batch correction is performed If not None, batch correction is performed per batch category
- use_highly_variable_genes :
bool
bool
(default:True
) If true, uses only the genes in anndata.var[“highly_variable”]
- subset_genes :
Sequence
[Union
[str
,int
]] |None
Optional
[Sequence
[Union
[str
,int
]]] (default:None
) Optional list of indices or gene names to subset anndata. If not None, use_highly_variable_genes is ignored
- linear_decoder :
bool
bool
(default:False
) If true, uses LDVAE model, which is an implementation of [Svensson20].
- copy :
bool
bool
(default:False
) If true, a copy of anndata is returned
- return_posterior :
bool
bool
(default:True
) If true, posterior object is returned
- use_cuda :
bool
bool
(default:True
) If true, uses cuda
- trainer_kwargs :
dict
dict
(default:{}
) Extra arguments for UnsupervisedTrainer
- model_kwargs :
dict
dict
(default:{}
) Extra arguments for VAE or LDVAE model
- adata :
- Return type
- Returns
If
copy
is true, anndata is returned. Ifreturn_posterior
is true, the posterior object is returned If bothcopy
andreturn_posterior
are true, a tuple of anndata and the posterior are returned in that order.adata.obsm['X_scvi']
stores the latent representationsadata.obsm['X_scvi_denoised']
stores the normalized mean of the negative binomialadata.obsm['X_scvi_sample_rate']
stores the mean of the negative binomialIf linear_decoder is true:
adata.uns['ldvae_loadings']
stores the per-gene weights in the linear decoder as a genes by n_latent matrix.