scanpy.external.pp.scvi(adata, n_hidden=128, n_latent=10, n_layers=1, dispersion='gene', n_epochs=400, lr=0.001, train_size=1.0, batch_key=None, use_highly_variable_genes=True, subset_genes=None, linear_decoder=False, copy=False, use_cuda=True, return_posterior=True, trainer_kwargs={}, model_kwargs={})

SCVI [Lopez18].

Fits scVI model onto raw count data given an anndata object

scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity.

To use a linear-decoded Variational AutoEncoder model (implementation of [Svensson20].), set linear_decoded = True. Compared to standard VAE, this model is less powerful, but can be used to inspect which genes contribute to variation in the dataset. It may also be used for all scVI tasks, like differential expression, batch correction, imputation, etc. However, batch correction may be less powerful as it assumes a linear model.


More information and bug reports here.

adata : AnnDataAnnData

An anndata file with X attribute of unnormalized count data

n_hidden : intint (default: 128)

Number of nodes per hidden layer

n_latent : intint (default: 10)

Dimensionality of the latent space

n_layers : intint (default: 1)

Number of hidden layers used for encoder and decoder NNs

dispersion : strstr (default: 'gene')

One of the following * 'gene' - dispersion parameter of NB is constant per gene across cells * 'gene-batch' - dispersion can differ between different batches * 'gene-label' - dispersion can differ between different labels * 'gene-cell' - dispersion can differ for every gene in every cell

n_epochs : intint (default: 400)

Number of epochs to train

lr : intint (default: 0.001)

Learning rate

train_size : intint (default: 1.0)

The train size, either a float between 0 and 1 or an integer for the number of training samples to use

batch_key : str, NoneOptional[str] (default: None)

Column name in anndata.obs for batches. If None, no batch correction is performed If not None, batch correction is performed per batch category

use_highly_variable_genes : boolbool (default: True)

If true, uses only the genes in anndata.var[“highly_variable”]

subset_genes : Sequence[Union[str, int]], NoneOptional[Sequence[Union[str, int]]] (default: None)

Optional list of indices or gene names to subset anndata. If not None, use_highly_variable_genes is ignored

linear_decoder : boolbool (default: False)

If true, uses LDVAE model, which is an implementation of [Svensson20].

copy : boolbool (default: False)

If true, a copy of anndata is returned

return_posterior : boolbool (default: True)

If true, posterior object is returned

use_cuda : boolbool (default: True)

If true, uses cuda

trainer_kwargs : dictdict (default: {})

Extra arguments for UnsupervisedTrainer

model_kwargs : dictdict (default: {})

Extra arguments for VAE or LDVAE model

Return type

AnnData, NoneOptional[AnnData]


If copy is true, anndata is returned. If return_posterior is true, the posterior object is returned If both copy and return_posterior are true, a tuple of anndata and the posterior are returned in that order.

adata.obsm['X_scvi'] stores the latent representations adata.obsm['X_scvi_denoised'] stores the normalized mean of the negative binomial adata.obsm['X_scvi_sample_rate'] stores the mean of the negative binomial

If linear_decoder is true: adata.uns['ldvae_loadings'] stores the per-gene weights in the linear decoder as a genes by n_latent matrix.