scanpy.pp.filter_cells

Contents

scanpy.pp.filter_cells#

scanpy.pp.filter_cells(data, *, min_counts=None, min_genes=None, max_counts=None, max_genes=None, inplace=True, copy=False)[source]#

Filter cell outliers based on counts and numbers of genes expressed.

For instance, only keep cells with at least min_counts counts or min_genes genes expressed. This is to filter measurement outliers, i.e. “unreliable” observations.

Only provide one of the optional parameters min_counts, min_genes, max_counts, max_genes per call.

Parameters:
data AnnData | csr_matrix | csc_matrix | ndarray | Array

The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

min_counts int | None (default: None)

Minimum number of counts required for a cell to pass filtering.

min_genes int | None (default: None)

Minimum number of genes expressed required for a cell to pass filtering.

max_counts int | None (default: None)

Maximum number of counts required for a cell to pass filtering.

max_genes int | None (default: None)

Maximum number of genes expressed required for a cell to pass filtering.

inplace bool (default: True)

Perform computation inplace or return result.

Return type:

AnnData | tuple[ndarray, ndarray] | None

Returns:

Depending on inplace, returns the following arrays or directly subsets and annotates the data matrix:

cells_subsetndarray

Boolean index mask that does filtering. True means that the cell is kept. False means the cell is removed.

number_per_cellndarray

Depending on what was thresholded (counts or genes), the array stores n_counts or n_cells per gene.

Examples

>>> import scanpy as sc
>>> adata = sc.datasets.krumsiek11()
UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
    utils.warn_names_duplicates("obs")
>>> adata.obs_names_make_unique()
>>> adata.n_obs
640
>>> adata.var_names.tolist()  
['Gata2', 'Gata1', 'Fog1', 'EKLF', 'Fli1', 'SCL',
 'Cebpa', 'Pu.1', 'cJun', 'EgrNab', 'Gfi1']
>>> # add some true zeros
>>> adata.X[adata.X < 0.3] = 0
>>> # simply compute the number of genes per cell
>>> sc.pp.filter_cells(adata, min_genes=0)
>>> adata.n_obs
640
>>> int(adata.obs['n_genes'].min())
1
>>> # filter manually
>>> adata_copy = adata[adata.obs['n_genes'] >= 3]
>>> adata_copy.n_obs
554
>>> int(adata_copy.obs['n_genes'].min())
3
>>> # actually do some filtering
>>> sc.pp.filter_cells(adata, min_genes=3)
>>> adata.n_obs
554
>>> int(adata.obs['n_genes'].min())
3