scanpy.tl.ingest#
- scanpy.tl.ingest(adata, adata_ref, *, obs=None, embedding_method=('umap', 'pca'), labeling_method='knn', neighbors_key=None, inplace=True, **kwargs)[source]#
Map labels and embeddings from reference data to new data.
Integrating data using ingest and BBKNN
Integrates embeddings and annotations of an
adata
with a reference datasetadata_ref
through projecting on a PCA (or alternate model) that has been fitted on the reference data. The function uses a knn classifier for mapping labels and the UMAP package [McInnes et al., 2018] for mapping the embeddings.Note
We refer to this asymmetric dataset integration as ingesting annotations from reference data to new data. This is different from learning a joint representation that integrates both datasets in an unbiased way, as CCA (e.g. in Seurat) or a conditional VAE (e.g. in scVI) would do.
You need to run
neighbors()
onadata_ref
before passing it.- Parameters:
- adata
AnnData
The annotated data matrix of shape
n_obs
×n_vars
. Rows correspond to cells and columns to genes. This is the dataset without labels and embeddings.- adata_ref
AnnData
The annotated data matrix of shape
n_obs
×n_vars
. Rows correspond to cells and columns to genes. Variables (n_vars
andvar_names
) ofadata_ref
should be the same as inadata
. This is the dataset with labels and embeddings which need to be mapped toadata
.- obs
str
|Iterable
[str
] |None
(default:None
) Labels’ keys in
adata_ref.obs
which need to be mapped toadata.obs
(inferred for observation ofadata
).- embedding_method
str
|Iterable
[str
] (default:('umap', 'pca')
) Embeddings in
adata_ref
which need to be mapped toadata
. The only supported values are ‘umap’ and ‘pca’.- labeling_method
str
(default:'knn'
) The method to map labels in
adata_ref.obs
toadata.obs
. The only supported value is ‘knn’.- neighbors_key
str
|None
(default:None
) If not specified, ingest looks at adata_ref.uns[‘neighbors’] for neighbors settings and adata_ref.obsp[‘distances’] for distances (default storage places for pp.neighbors). If specified, ingest looks at adata_ref.uns[neighbors_key] for neighbors settings and adata_ref.obsp[adata_ref.uns[neighbors_key][‘distances_key’]] for distances.
- inplace
bool
(default:True
) Only works if
return_joint=False
. Add labels and embeddings to the passedadata
(ifTrue
) or return a copy ofadata
with mapped embeddings and labels.
- adata
- Returns:
Returns
None
ifcopy=False
, else returns anAnnData
object. Sets the following fields:adata.obs[obs]
pandas.Series
(dtypecategory
)Mapped labels.
adata.obsm['X_umap' | 'X_pca']
numpy.ndarray
(dtypefloat
)Mapped embeddings.
'X_umap'
ifembedding_method
is'umap'
,'X_pca'
ifembedding_method
is'pca'
.
Example
Call sequence:
>>> import scanpy as sc >>> sc.pp.neighbors(adata_ref) >>> sc.tl.umap(adata_ref) >>> sc.tl.ingest(adata, adata_ref, obs='cell_type')