scanpy.external.pp.dca#
- scanpy.external.pp.dca(adata, mode='denoise', *, ae_type='nb-conddisp', normalize_per_cell=True, scale=True, log1p=True, hidden_size=(64, 32, 64), hidden_dropout=0.0, batchnorm=True, activation='relu', init='glorot_uniform', network_kwds=mappingproxy({}), epochs=300, reduce_lr=10, early_stop=15, batch_size=32, optimizer='RMSprop', random_state=0, threads=None, learning_rate=None, verbose=False, training_kwds=mappingproxy({}), return_model=False, return_info=False, copy=False)[source]#
 Deep count autoencoder [Eraslan et al., 2019].
Fits a count autoencoder to the raw count data given in the anndata object in order to denoise the data and to capture hidden representation of cells in low dimensions. Type of the autoencoder and return values are determined by the parameters.
Note
More information and bug reports here.
- Parameters:
 - adata 
AnnData An anndata file with
.rawattribute representing raw counts.- mode 
Literal['denoise','latent'] (default:'denoise') denoiseoverwritesadata.Xwith denoised expression values. Inlatentmode DCA addsadata.obsm['X_dca']to given adata object. This matrix represent latent representation of cells via DCA.- ae_type 
Literal['zinb-conddisp','zinb','nb-conddisp','nb'] (default:'nb-conddisp') Type of the autoencoder. Return values and the architecture is determined by the type e.g.
nbdoes not provide dropout probabilities. Types that end with “-conddisp”, assumes that dispersion is mean dependant.- normalize_per_cell 
bool(default:True) If true, library size normalization is performed using the
sc.pp.normalize_per_cellfunction in Scanpy and saved into adata object. Mean layer is re-introduces library size differences by scaling the mean value of each cell in the output layer. See the manuscript for more details.- scale 
bool(default:True) If true, the input of the autoencoder is centered using
sc.pp.scalefunction of Scanpy. Note that the output is kept as raw counts as loss functions are designed for the count data.- log1p 
bool(default:True) If true, the input of the autoencoder is log transformed with a pseudocount of one using
sc.pp.log1pfunction of Scanpy.- hidden_size 
Sequence[int] (default:(64, 32, 64)) Width of hidden layers.
- hidden_dropout 
float|Sequence[float] (default:0.0) Probability of weight dropout in the autoencoder (per layer if list or tuple).
- batchnorm 
bool(default:True) If true, batch normalization is performed.
- activation 
str(default:'relu') Activation function of hidden layers.
- init 
str(default:'glorot_uniform') Initialization method used to initialize weights.
- network_kwds 
Mapping[str,Any] (default:mappingproxy({})) Additional keyword arguments for the autoencoder.
- epochs 
int(default:300) Number of total epochs in training.
- reduce_lr 
int(default:10) Reduces learning rate if validation loss does not improve in given number of epochs.
- early_stop 
int(default:15) Stops training if validation loss does not improve in given number of epochs.
- batch_size 
int(default:32) Number of samples in the batch used for SGD.
- optimizer 
str(default:'RMSprop') Type of optimization method used for training.
- random_state 
int|RandomState|None(default:0) Seed for python, numpy and tensorflow.
- threads 
int|None(default:None) Number of threads to use in training. All cores are used by default.
- learning_rate 
float|None(default:None) Learning rate to use in the training.
- verbose 
bool(default:False) If true, prints additional information about training and architecture.
- training_kwds 
Mapping[str,Any] (default:mappingproxy({})) Additional keyword arguments for the training process.
- return_model 
bool(default:False) If true, trained autoencoder object is returned. See “Returns”.
- return_info 
bool(default:False) If true, all additional parameters of DCA are stored in
adata.obsmsuch as dropout probabilities (obsm[‘X_dca_dropout’]) and estimated dispersion values (obsm[‘X_dca_dispersion’]), in case that autoencoder is of type zinb or zinb-conddisp.- copy 
bool(default:False) If true, a copy of anndata is returned.
- adata 
 - Return type:
 - Returns:
 If
copyis true andreturn_modelis false, AnnData object is returned.In “denoise” mode,
adata.Xis overwritten with the denoised values. In “latent” mode, latent low dimensional representation of cells are stored inadata.obsm['X_dca']andadata.Xis not modified. Note that these values are not corrected for library size effects.If
return_infois true, all estimated distribution parameters are stored in AnnData like this:.obsm["X_dca_dropout"]The mixture coefficient (pi) of the zero component in ZINB, i.e. dropout probability (if
ae_typeiszinborzinb-conddisp)..obsm["X_dca_dispersion"]The dispersion parameter of NB.
.uns["dca_loss_history"]The loss history of the training. See
.historyattribute of Keras History class for mode details.
Finally, the raw counts are stored in
.rawattribute of AnnData object.If
return_modelis given, trained model is returned. When bothcopyandreturn_modelare true, a tuple of anndata and model is returned in that order.