scanpy.datasets.pbmc3k_processed

scanpy.datasets.pbmc3k_processed#

scanpy.datasets.pbmc3k_processed()[source]#

Processed 3k PBMCs from 10x Genomics.

Processed using the basic tutorial Preprocessing and clustering 3k PBMCs (legacy workflow).

For preprocessing, cells are filtered out that have few gene counts or too high a percent_mito. The counts are logarithmized and only genes marked by highly_variable_genes() are retained. The obs variables n_counts and percent_mito are corrected for using regress_out(), and values are scaled and clipped by scale(). Finally, pca() and neighbors() are calculated.

As analysis steps, the embeddings tsne() and umap() are performed. Communities are identified using louvain() and marker genes using rank_genes_groups().

Return type:

AnnData

Returns:

Annotated data matrix.

Examples

>>> import scanpy as sc
>>> sc.datasets.pbmc3k_processed()
AnnData object with n_obs × n_vars = 2638 × 1838
    obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain'
    var: 'n_cells'
    uns: 'draw_graph', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups'
    obsm: 'X_pca', 'X_tsne', 'X_umap', 'X_draw_graph_fr'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'