scanpy.pp.filter_cells
- scanpy.pp.filter_cells(data, min_counts=None, min_genes=None, max_counts=None, max_genes=None, inplace=True, copy=False)
Filter cell outliers based on counts and numbers of genes expressed.
For instance, only keep cells with at least
min_counts
counts ormin_genes
genes expressed. This is to filter measurement outliers, i.e. “unreliable” observations.Only provide one of the optional parameters
min_counts
,min_genes
,max_counts
,max_genes
per call.- Parameters:
- data :
AnnData
The (annotated) data matrix of shape
n_obs
×n_vars
. Rows correspond to cells and columns to genes.- min_counts :
Optional
[int
] (default:None
) Minimum number of counts required for a cell to pass filtering.
- min_genes :
Optional
[int
] (default:None
) Minimum number of genes expressed required for a cell to pass filtering.
- max_counts :
Optional
[int
] (default:None
) Maximum number of counts required for a cell to pass filtering.
- max_genes :
Optional
[int
] (default:None
) Maximum number of genes expressed required for a cell to pass filtering.
- inplace :
bool
(default:True
) Perform computation inplace or return result.
- data :
- Return type:
- Returns:
: Depending on
inplace
, returns the following arrays or directly subsets and annotates the data matrix:- cells_subset
Boolean index mask that does filtering.
True
means that the cell is kept.False
means the cell is removed.- number_per_cell
Depending on what was tresholded (
counts
orgenes
), the array storesn_counts
orn_cells
per gene.
Examples
>>> import scanpy as sc >>> adata = sc.datasets.krumsiek11() >>> adata.n_obs 640 >>> adata.var_names ['Gata2' 'Gata1' 'Fog1' 'EKLF' 'Fli1' 'SCL' 'Cebpa' 'Pu.1' 'cJun' 'EgrNab' 'Gfi1'] >>> # add some true zeros >>> adata.X[adata.X < 0.3] = 0 >>> # simply compute the number of genes per cell >>> sc.pp.filter_cells(adata, min_genes=0) >>> adata.n_obs 640 >>> adata.obs['n_genes'].min() 1 >>> # filter manually >>> adata_copy = adata[adata.obs['n_genes'] >= 3] >>> adata_copy.obs['n_genes'].min() >>> adata.n_obs 554 >>> adata.obs['n_genes'].min() 3 >>> # actually do some filtering >>> sc.pp.filter_cells(adata, min_genes=3) >>> adata.n_obs 554 >>> adata.obs['n_genes'].min() 3