scanpy.pp.calculate_qc_metrics

scanpy.pp.calculate_qc_metrics(adata, expr_type='counts', var_type='genes', qc_vars=(), percent_top=(50, 100, 200, 500), inplace=False)

Calculate quality control metrics.

Calculates a number of qc metrics for an AnnData object, see section Returns for specifics. Largely based on calculateQCMetrics from scater [McCarthy17]. Currently is most efficient on a sparse CSR or dense matrix.

Parameters:
adata : AnnData

Annotated data matrix.

expr_type : str, optional (default: "counts")

Name of kind of values in X.

var_type : str, optional (default: "genes")

The kind of thing the variables are.

qc_vars : Container, optional (default: ())

Keys for boolean columns of .var which identify variables you could want to control for (e.g. “ERCC” or “mito”).

percent_top : Container[int], optional (default: (50, 100, 200, 500))

Which proportions of top genes to cover. If empty or None don’t calculate. Values are considered 1-indexed, percent_top=[50] finds cumulative proportion to the 50th most expressed gene.

inplace : bool, optional (default: False)

Whether to place calculated metrics in .obs and .var

Returns:

Depending on inplace returns calculated metrics (pd.DataFrame) or updates adata’s obs and var.

Observation level metrics include:

  • total_{var_type}_by_{expr_type}
    E.g. “total_genes_by_counts”. Number of genes with positive counts in a cell.
  • total_{expr_type}
    E.g. “total_counts”. Total number of counts for a cell.
  • pct_{expr_type}_in_top_{n}_{var_type} - for n in percent_top
    E.g. “pct_counts_in_top_50_genes”. Cumulative percentage of counts for 50 most expressed genes in a cell.
  • total_{expr_type}_{qc_var} - for qc_var in qc_vars
    E.g. “total_counts_mito”. Total number of counts for variabes in qc_vars.
  • pct_{expr_type}_{qc_var} - for qc_var in qc_vars
    E.g. “pct_counts_mito”. Proportion of total counts for a cell which are mitochondrial.

Variable level metrics include:

  • total_{expr_type}
    E.g. “total_counts”. Sum of counts for a gene.
  • mean_{expr_type}
    E.g. “mean counts”. Mean expression over all cells.
  • n_cells_by_{expr_type}
    E.g. “n_cells_by_counts”. Number of cells this expression is measured in.
  • pct_dropout_by_{expr_type}
    E.g. “pct_dropout_by_counts”. Percentage of cells this feature does not appear in.

Return type:

Union[NoneType, Tuple[pd.DataFrame, pd.DataFrame]]

Example

Calculate qc metrics for visualization.

>>> adata = sc.datasets.pbmc3k()
>>> sc.pp.calculate_qc_metrics(adata, inplace=True)
>>> sns.jointplot(adata.obs, "log1p_total_counts", "log1p_n_genes_by_counts", kind="hex")