Read spatial omics

[1]:

import SOAPy_st as sp
import pandas as pd

Read Visium

we used 10X Visium data of mouse dorsolateral prefrontal cortex (DLPFC, 151676) as an example to read the 10X Visium data. To prepare raw data, follow these steps:

1.Raw data file could be download from https://research.libd.org/globus/.

2.Click jhpce#HumanPilot10x and 151676 in turn.

3.Download 151676_raw_feature_bc_matrix.h5, tissue_hires_image.png, tissue_lowres_image.png, tissue_positions_list.txt and scalefactors_json.json.

4.Rename tissue_positions_list.txt to tissue_positions_list.csv.

5.Assemble the folders as 151676/151676_raw_feature_bc_matrix.h5 , 151676/spatial/tissue_hires_image.png, 151676/spatial/tissue_lowres_image.png, 151676/spatial/tissue_positions_list.csv and 151676/spatial/scalefactors_json.json.

[2]:

adata_visium = sp.pp.read_visium2adata(
    path = './151676/',
    count_file = '151676_raw_feature_bc_matrix.h5'
)

/home/wangheqi/anaconda3/envs/SpatialOmics/lib/python3.9/site-packages/anndata/_core/anndata.py:1832: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
  utils.warn_names_duplicates("var")

[3]:

adata_visium

[3]:

AnnData object with n_obs × n_vars = 4992 × 33538
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

Read GeoMx DSP

Read spatial transcriptomics data of NanoString GeoMx DSP. Mouse embryonic development samples are used as examples.

Download Count Results and E13 Images files from https://nanostring.com/products/geomx-digital-spatial-profiler/spatial-organ-atlas/mouse-development/ .

[4]:

adata_dsp = sp.pp.read_dsp2adata(
    # The path of 'Images' files
    xml_file={
        # Using the xml file for two samples as an example,
        # you can add key-value pairs to the dictionary if you need information about sample points for more samples.
        'mu_dev_E13_006': './nanostring_growth/mu_dev_E13_006.ome.xml',
        'mu_dev_E13_011': './nanostring_growth/mu_dev_E13_011.ome.xml'
    },
    # The path of Count Results
    information_file='./nanostring_growth/Export4_NormalizationQ3.xlsx',
)

[5]:

adata_dsp.obs.head()

[5]:

	SlideName	ScanLabel	ROILabel	SegmentLabel	QCFlags	AOISurfaceArea	AOINucleiCount	ROICoordinateX	ROICoordinateY	RawReads	...	Timepoint	ROIID	SegmentID	ScanWidth	ScanHeight	ScanOffsetX	ScanOffsetY	LOQ (Mouse NGS Whole Transcriptome Atlas RNA)	NormalizationFactor	ExpressionFilteringThreshold (Mouse NGS Whole Transcriptome Atlas RNA)
SegmentDisplayName
mu_dev_E9_001 \| 001 \| Full ROI	mu_dev_E9_001	mu_dev_E9_001	1	Full ROI	Low Negative Probe Count for Probe Kit Mouse N...	47287.021916	392	16573	18896	4259786	...	E9	c73163bc-f107-498f-bd40-bbcab9a48993	f057dc6e-68ce-441d-a816-58802fc38258	16904.210938	20578.818359	7932	6094	16.252453	0.536152	16.252453
mu_dev_E9_001 \| 002 \| Full ROI	mu_dev_E9_001	mu_dev_E9_001	2	Full ROI	Low Negative Probe Count for Probe Kit Mouse N...	41175.373907	340	16485	19752	4725639	...	E9	be667b65-38c0-49c4-af51-845ffd8a7a85	09985ba0-449c-4b1a-9c8f-9327991df8fa	16904.210938	20578.818359	7932	6094	17.745085	0.496225	17.745085
mu_dev_E9_001 \| 003 \| Full ROI	mu_dev_E9_001	mu_dev_E9_001	3	Full ROI	Low Negative Probe Count for Probe Kit Mouse N...	43198.870210	403	15756	18824	5958816	...	E9	ba522e1c-7e21-4cc6-b529-118603949d5a	2ac08d0d-c65d-4ab9-b834-5ef7ebbad4cd	16904.210938	20578.818359	7932	6094	18.109046	0.395298	18.109046
mu_dev_E9_001 \| 004 \| Full ROI	mu_dev_E9_001	mu_dev_E9_001	4	Full ROI	Low Negative Probe Count for Probe Kit Mouse N...	44444.810459	368	15722	19675	3703922	...	E9	52d9a6b1-934d-4f42-a80e-a4a78b7ede43	aeed549d-8b7a-4fa4-b22a-c54059e83066	16904.210938	20578.818359	7932	6094	14.509348	0.605782	14.509348
mu_dev_E9_001 \| 005 \| Full ROI	mu_dev_E9_001	mu_dev_E9_001	5	Full ROI	Low Negative Probe Count for Probe Kit Mouse N...	31889.529594	279	15064	18429	3069897	...	E9	a9e0bca3-59a4-4131-90b7-c787ca400759	c53d1b52-712e-4a3d-9af4-5bdb55365eef	16904.210938	20578.818359	7932	6094	12.118616	0.717618	12.118616

5 rows × 35 columns

The sampling points corresponding to each ROI is stored in.uns.point.

[6]:

adata_dsp.uns['point']

[6]:

	slide	roi	x	y
0	mu_dev_E13_006	1	13011.793535	10484.417086
1	mu_dev_E13_006	1	13109.708338	10499.139178
2	mu_dev_E13_006	1	13184.708338	10547.123928
3	mu_dev_E13_006	1	13242.708338	10642.093747
4	mu_dev_E13_006	1	13261.840745	10730.656697
...	...	...	...	...
4171	mu_dev_E13_011	58	4548.893838	10387.159712
4172	mu_dev_E13_011	58	4477.882657	10362.799483
4173	mu_dev_E13_011	58	4437.660936	10371.299021
4174	mu_dev_E13_011	58	4487.202775	10425.060315
4175	mu_dev_E13_011	58	4636.188087	10423.920083

4176 rows × 4 columns

Read other barcode-based data

In most cases, the raw data of barcode-based spatial omics technology can be expressed in two tables: the coordinate information of each cell (spot) and the expression of each cell (spot). This sp.pp.read_csv2adata() generates the Anndata format by providing the two tables by the user.

Here we use slide-seqV2 data from the mouse olfactory bulb as a demonstration. Download Puck_200127_15.digital_expression.txt.gz and Puck_200127_15_bead_locations.csv from https://singlecell.broadinstitute.org/single_cell/study/SCP815.

[7]:

express = pd.read_csv('./Slide_seqV2/Puck_200127_15.digital_expression.txt', index_col=0, header=0, sep='\t')
location = pd.read_csv('./Slide_seqV2/Puck_200127_15_bead_locations.csv', index_col=0, header=0)

[8]:

adata_csv = sp.pp.read_csv2adata(express.T, spatial=location)

[9]:

adata_csv

[9]:

AnnData object with n_obs × n_vars = 21724 × 21220

Read imaged-based data

The image-based spatial omics technology needs to perform cell segmentation first, and quantitatively generate anndata format through the results of cell segmentation. Users are required to provide images of cell segmentation and staining images for each marker.

Here we use one (sample 1) of breast cancer MIBI-TOF dataests as an example. Download from https://mibi-share.ionpath.com. chick Ji, Rubin, Thrane, Jiang et al.’s Download.

[10]:

import tifffile as tiff
import matplotlib.pyplot as plt
import cv2 as cv
image = tiff.imread('./mibi_tof/TA459_multipleCores2_Run-4_Point1.tiff')
mask = cv.imread('./mibi_tof/TA459_multipleCores2_Run-4_Point1/segmentation_interior.png')
mask = mask[:, :, 0]

[11]:

plt.imshow(mask, cmap='gray')
plt.show()

../_images/Tutorials_Read_spatial_omics_20_0.png

[12]:

plt.imshow(image[8, :, :], cmap='gray', vmax=5)
plt.show()

../_images/Tutorials_Read_spatial_omics_21_0.png

Determine the name of each channel and the channel that needs to be removed to obtain the quantitative Anndata.

[13]:

channel_names=[
    'Au','Background','Beta_catenin','Ca','CD11b','CD11c','CD138','CD16','CD20','CD209','CD3',
    'CD31','CD4','CD45','CD45RO','CD56','CD63','CD68','CD8','dsDNA','EGFR','Fe','FoxP3','H3K27me3',
    'H3K9ac','HLA-DR','HLA-I','IDO','CK17','CK6','Ki67','Lag3','MPO','Na','P','p53','PanCK','PD-L1',
    'PD-1','pS6','Si','SMA','Ta','Vimentin'
]

exp_removed = [0,1,3,19,21,23,33,34,40,42]

[14]:

adata_img = sp.pp.read_mult_image2adata(
    image=image,
    mask=mask,
    channel_names=channel_names,
    remove_channels=exp_removed
)

[15]:

adata_img

[15]:

AnnData object with n_obs × n_vars = 5240 × 44
    uns: 'img', 'mask', 'var_for_analysis', 'spatial', 'SOAPy'
    obsm: 'spatial'