scanpy.pp.filter_cells#
- scanpy.pp.filter_cells(data, *, min_counts=None, min_genes=None, max_counts=None, max_genes=None, inplace=True, copy=False)[source]#
Filter cell outliers based on counts and numbers of genes expressed.
For instance, only keep cells with at least
min_counts
counts ormin_genes
genes expressed. This is to filter measurement outliers, i.e. “unreliable” observations.Only provide one of the optional parameters
min_counts
,min_genes
,max_counts
,max_genes
per call.- Parameters:
- data
AnnData
|spmatrix
|ndarray
|Array
The (annotated) data matrix of shape
n_obs
×n_vars
. Rows correspond to cells and columns to genes.- min_counts
int
|None
(default:None
) Minimum number of counts required for a cell to pass filtering.
- min_genes
int
|None
(default:None
) Minimum number of genes expressed required for a cell to pass filtering.
- max_counts
int
|None
(default:None
) Maximum number of counts required for a cell to pass filtering.
- max_genes
int
|None
(default:None
) Maximum number of genes expressed required for a cell to pass filtering.
- inplace
bool
(default:True
) Perform computation inplace or return result.
- data
- Return type:
- Returns:
Depending on
inplace
, returns the following arrays or directly subsets and annotates the data matrix:
Examples
>>> import scanpy as sc >>> adata = sc.datasets.krumsiek11() UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`. utils.warn_names_duplicates("obs") >>> adata.obs_names_make_unique() >>> adata.n_obs 640 >>> adata.var_names.tolist() ['Gata2', 'Gata1', 'Fog1', 'EKLF', 'Fli1', 'SCL', 'Cebpa', 'Pu.1', 'cJun', 'EgrNab', 'Gfi1'] >>> # add some true zeros >>> adata.X[adata.X < 0.3] = 0 >>> # simply compute the number of genes per cell >>> sc.pp.filter_cells(adata, min_genes=0) >>> adata.n_obs 640 >>> int(adata.obs['n_genes'].min()) 1 >>> # filter manually >>> adata_copy = adata[adata.obs['n_genes'] >= 3] >>> adata_copy.n_obs 554 >>> int(adata_copy.obs['n_genes'].min()) 3 >>> # actually do some filtering >>> sc.pp.filter_cells(adata, min_genes=3) >>> adata.n_obs 554 >>> int(adata.obs['n_genes'].min()) 3