Import Scanpy as:
import scanpy as sc
The typical workflow consists of subsequent calls of data analysis tools
sc.tl.umap(adata, **tool_params) # embed a neighborhood graph of the data using UMAP
adata is an
Each of these calls adds annotation to an expression matrix X,
which stores n_obs observations (cells) of n_vars variables (genes).
For each tool, there typically is an associated plotting function in
If you pass
Axes instance is returned
and you have all of matplotlib’s detailed configuration possibilities.
To facilitate writing memory-efficient pipelines, by default,
Scanpy tools operate inplace on
adata and return
this also allows to easily transition to out-of-memory pipelines.
If you want to return a copy of the
and leave the passed
adata unchanged, pass
Scanpy is based on
anndata, which provides the
At the most basic level, an
a data matrix
adata.X, annotation of observations
adata.obs and variables
pd.DataFrame and unstructured
dict. Names of observations and
variables can be accessed via
AnnData objects can be sliced like
dataframes, for example,
adata_subset = adata[:, list_of_gene_names].
For more, see this blog post.
To read a data file to an
AnnData object, call:
adata = sc.read(filename)
to initialize an
AnnData object. Possibly add further annotation using, e.g.,
import pandas as pd anno = pd.read_csv(filename_sample_annotation) adata.obs['cell_groups'] = anno['cell_groups'] # categorical annotation of type pandas.Categorical adata.obs['time'] = anno['time'] # numerical annotation of type float # alternatively, you could also set the whole dataframe # adata.obs = anno
To write, use:
adata.write(filename) adata.write_csvs(filename) adata.write_loom(filename)