scanpy.get.aggregate#
- scanpy.get.aggregate(adata, by, func, *, axis=None, mask=None, dof=1, layer=None, obsm=None, varm=None)[source]#
Aggregate data matrix based on some categorical grouping.
This function is useful for pseudobulking as well as plotting.
Aggregation to perform is specified by
func
, which can be a single metric or a list of metrics. Each metric is computed over the group and results in a new layer in the outputAnnData
object.If none of
layer
,obsm
, orvarm
are passed in,X
will be used for aggregation data.- Parameters:
- adata
AnnData
AnnData
to be aggregated.- by
str
|Collection
[str
] Key of the column to be grouped-by.
- func
Union
[Literal
['count_nonzero'
,'mean'
,'sum'
,'var'
,'median'
],Iterable
[Literal
['count_nonzero'
,'mean'
,'sum'
,'var'
,'median'
]]] How to aggregate.
- axis
Optional
[Literal
['obs'
,0
,'var'
,1
]] (default:None
) Axis on which to find group by column.
- mask
ndarray
[Any
,dtype
[bool
]] |str
|None
(default:None
) Boolean mask (or key to column containing mask) to apply along the axis.
- dof
int
(default:1
) Degrees of freedom for variance. Defaults to 1.
- layer
str
|None
(default:None
) If not None, key for aggregation data.
- obsm
str
|None
(default:None
) If not None, key for aggregation data.
- varm
str
|None
(default:None
) If not None, key for aggregation data.
- adata
- Return type:
- Returns:
Aggregated
AnnData
.
Examples
Calculating mean expression and number of nonzero entries per cluster:
>>> import scanpy as sc, pandas as pd >>> pbmc = sc.datasets.pbmc3k_processed().raw.to_adata() >>> pbmc.shape (2638, 13714) >>> aggregated = sc.get.aggregate(pbmc, by="louvain", func=["mean", "count_nonzero"]) >>> aggregated AnnData object with n_obs × n_vars = 8 × 13714 obs: 'louvain' var: 'n_cells' layers: 'mean', 'count_nonzero'
We can group over multiple columns:
>>> pbmc.obs["percent_mito_binned"] = pd.cut(pbmc.obs["percent_mito"], bins=5) >>> sc.get.aggregate(pbmc, by=["louvain", "percent_mito_binned"], func=["mean", "count_nonzero"]) AnnData object with n_obs × n_vars = 40 × 13714 obs: 'louvain', 'percent_mito_binned' var: 'n_cells' layers: 'mean', 'count_nonzero'
Note that this filters out any combination of groups that wasn’t present in the original data.