scanpy.external.pp.dca
- scanpy.external.pp.dca(adata, mode='denoise', ae_type='nb-conddisp', normalize_per_cell=True, scale=True, log1p=True, hidden_size=(64, 32, 64), hidden_dropout=0.0, batchnorm=True, activation='relu', init='glorot_uniform', network_kwds=mappingproxy({}), epochs=300, reduce_lr=10, early_stop=15, batch_size=32, optimizer='RMSprop', random_state=0, threads=None, learning_rate=None, verbose=False, training_kwds=mappingproxy({}), return_model=False, return_info=False, copy=False)
Deep count autoencoder [Eraslan18].
Fits a count autoencoder to the raw count data given in the anndata object in order to denoise the data and to capture hidden representation of cells in low dimensions. Type of the autoencoder and return values are determined by the parameters.
Note
More information and bug reports here.
- Parameters
- adata :
AnnData
AnnData
An anndata file with
.raw
attribute representing raw counts.- mode : {‘denoise’, ‘latent’}
Literal
[‘denoise’, ‘latent’] (default:'denoise'
) denoise
overwritesadata.X
with denoised expression values. Inlatent
mode DCA addsadata.obsm['X_dca']
to given adata object. This matrix represent latent representation of cells via DCA.- ae_type : {‘zinb-conddisp’, ‘zinb’, ‘nb-conddisp’, ‘nb’}
Literal
[‘zinb-conddisp’, ‘zinb’, ‘nb-conddisp’, ‘nb’] (default:'nb-conddisp'
) Type of the autoencoder. Return values and the architecture is determined by the type e.g.
nb
does not provide dropout probabilities. Types that end with “-conddisp”, assumes that dispersion is mean dependant.- normalize_per_cell :
bool
bool
(default:True
) If true, library size normalization is performed using the
sc.pp.normalize_per_cell
function in Scanpy and saved into adata object. Mean layer is re-introduces library size differences by scaling the mean value of each cell in the output layer. See the manuscript for more details.- scale :
bool
bool
(default:True
) If true, the input of the autoencoder is centered using
sc.pp.scale
function of Scanpy. Note that the output is kept as raw counts as loss functions are designed for the count data.- log1p :
bool
bool
(default:True
) If true, the input of the autoencoder is log transformed with a pseudocount of one using
sc.pp.log1p
function of Scanpy.- hidden_size :
Sequence
[int
]Sequence
[int
] (default:(64, 32, 64)
) Width of hidden layers.
- hidden_dropout :
float
|Sequence
[float
]Union
[float
,Sequence
[float
]] (default:0.0
) Probability of weight dropout in the autoencoder (per layer if list or tuple).
- batchnorm :
bool
bool
(default:True
) If true, batch normalization is performed.
- activation :
str
str
(default:'relu'
) Activation function of hidden layers.
- init :
str
str
(default:'glorot_uniform'
) Initialization method used to initialize weights.
- network_kwds :
Mapping
Mapping
[str
,Any
] (default:mappingproxy({})
) Additional keyword arguments for the autoencoder.
- epochs :
int
int
(default:300
) Number of total epochs in training.
- reduce_lr :
int
int
(default:10
) Reduces learning rate if validation loss does not improve in given number of epochs.
- early_stop :
int
int
(default:15
) Stops training if validation loss does not improve in given number of epochs.
- batch_size :
int
int
(default:32
) Number of samples in the batch used for SGD.
- optimizer :
str
str
(default:'RMSprop'
) Type of optimization method used for training.
- random_state :
None
|int
|RandomState
Union
[None
,int
,RandomState
] (default:0
) Seed for python, numpy and tensorflow.
- threads :
int
|None
Optional
[int
] (default:None
) Number of threads to use in training. All cores are used by default.
- learning_rate :
float
|None
Optional
[float
] (default:None
) Learning rate to use in the training.
- verbose :
bool
bool
(default:False
) If true, prints additional information about training and architecture.
- training_kwds :
Mapping
Mapping
[str
,Any
] (default:mappingproxy({})
) Additional keyword arguments for the training process.
- return_model :
bool
bool
(default:False
) If true, trained autoencoder object is returned. See “Returns”.
- return_info :
bool
bool
(default:False
) If true, all additional parameters of DCA are stored in
adata.obsm
such as dropout probabilities (obsm[‘X_dca_dropout’]) and estimated dispersion values (obsm[‘X_dca_dispersion’]), in case that autoencoder is of type zinb or zinb-conddisp.- copy :
bool
bool
(default:False
) If true, a copy of anndata is returned.
- adata :
- Return type
- Returns
If
copy
is true andreturn_model
is false, AnnData object is returned.In “denoise” mode,
adata.X
is overwritten with the denoised values. In “latent” mode, latent low dimensional representation of cells are stored inadata.obsm['X_dca']
andadata.X
is not modified. Note that these values are not corrected for library size effects.If
return_info
is true, all estimated distribution parameters are stored in AnnData like this:.obsm["X_dca_dropout"]
The mixture coefficient (pi) of the zero component in ZINB, i.e. dropout probability (if
ae_type
iszinb
orzinb-conddisp
)..obsm["X_dca_dispersion"]
The dispersion parameter of NB.
.uns["dca_loss_history"]
The loss history of the training. See
.history
attribute of Keras History class for mode details.
Finally, the raw counts are stored in
.raw
attribute of AnnData object.If
return_model
is given, trained model is returned. When bothcopy
andreturn_model
are true, a tuple of anndata and model is returned in that order.