scanpy.get.obs_df

Contents

scanpy.get.obs_df#

scanpy.get.obs_df(adata, keys=(), obsm_keys=(), *, layer=None, gene_symbols=None, use_raw=False)[source]#

Return values for observations in adata.

Parameters:
adata AnnData

AnnData object to get values from.

keys Collection[str] (default: ())

Keys from either .var_names, .var[gene_symbols], or .obs.columns.

obsm_keys Iterable[tuple[str, int]] (default: ())

Tuples of (key from obsm, column index of obsm[key]).

layer str | None (default: None)

Layer of adata to use as expression values.

gene_symbols str | None (default: None)

Column of adata.var to search for keys in.

use_raw bool (default: False)

Whether to get expression values from adata.raw.

Return type:

DataFrame

Returns:

A dataframe with adata.obs_names as index, and values specified by keys and obsm_keys.

Examples

Getting value for plotting:

>>> import scanpy as sc
>>> pbmc = sc.datasets.pbmc68k_reduced()
>>> plotdf = sc.get.obs_df(
...     pbmc,
...     keys=["CD8B", "n_genes"],
...     obsm_keys=[("X_umap", 0), ("X_umap", 1)]
... )
>>> plotdf.columns
Index(['CD8B', 'n_genes', 'X_umap-0', 'X_umap-1'], dtype='object')
>>> plotdf.plot.scatter("X_umap-0", "X_umap-1", c="CD8B")  
<Axes: xlabel='X_umap-0', ylabel='X_umap-1'>

Calculating mean expression for marker genes by cluster:

>>> pbmc = sc.datasets.pbmc68k_reduced()
>>> marker_genes = ['CD79A', 'MS4A1', 'CD8A', 'CD8B', 'LYZ']
>>> genedf = sc.get.obs_df(
...     pbmc,
...     keys=["louvain", *marker_genes]
... )
>>> grouped = genedf.groupby("louvain", observed=True)
>>> mean, var = grouped.mean(), grouped.var()