scanpy.datasets.pbmc3k

Contents

scanpy.datasets.pbmc3k#

scanpy.datasets.pbmc3k()[source]#

3k PBMCs from 10x Genomics.

The data consists in 3k PBMCs from a Healthy Donor and is freely available from 10x Genomics (file from this webpage).

The exact same data is also used in Seurat’s basic clustering tutorial.

Note

This downloads 5.9 MB of data upon the first call of the function and stores it in datasetdir/pbmc3k_raw.h5ad.

The following code was run to produce the file.

adata = sc.read_10x_mtx(
    # the directory with the `.mtx` file
    './data/filtered_gene_bc_matrices/hg19/',
    # use gene symbols for the variable names (variables-axis index)
    var_names='gene_symbols',
    # write a cache file for faster subsequent reading
    cache=True,
)

adata.var_names_make_unique()  # this is unnecessary if using 'gene_ids'
adata.write('write/pbmc3k_raw.h5ad', compression='gzip')
Return type:

AnnData

Returns:

Annotated data matrix.

Examples

>>> import scanpy as sc
>>> sc.datasets.pbmc3k()
AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'