Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpaceM datasets #47

Merged
merged 3 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions spacem_helanih3t3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
### Data

This is a metabolomics dataset from experiments on Hepa and NIH3T3 cells using the [SpaceM](https://doi.org/10.1038/s41592-021-01198-0) method, by [Alexandrov group, EMBL](https://www.embl.org/groups/alexandrov/).

The data consist of the following items:

- coordinate systems:
Each set of processed images/labels is registered in a corresponding coordinate system with matching prefix.
(This is because "global" is the default coordinate system of incoming unregistered data and is treated unmutable).
- images:
- `….pre_maldi`: Microscopy, with `Trans` and `GFP` channels
- `….post_maldi`: Microscopy after MALDI measurements, with `Trans` and `Dapi` channels
- labels:
- `….cells`: Segmentation of pre-MALDI images
- `….ablation_marks`: Segmentation of post-MALDI images
- shapes:
- `….layout`: Bounding boxes of wells on a slide
- `….maldi_regions`: Bounding boxes for the MALDI measurements
- tables:
- `table`:
- for all annotated elements: `project_id`, `slide_id`, `well_id`, `maldi_region_id`
- for segmentations:
- `object_type`, `replicate`, `treatment`
- scikit-image region properties
- `X`: MALDI ion intensities

### Download

The dataset is already natively in SpatialData 0.1.2 format.

Download the data with `download.py`, (`to_zarr.py` exists solely for consistency).
11 changes: 11 additions & 0 deletions spacem_helanih3t3/download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env python3
import os
import subprocess

URL = "https://s3.embl.de/spatialdata/raw_data/20221014_HeLaNIH3T3.small.zip"

os.chdir(os.path.dirname(__file__))
command = f"curl {URL} --output 'data.zip'"
subprocess.run(command, shell=True, check=True)
subprocess.run("unzip -o data.zip", shell=True, check=True)
subprocess.run("mv spatialdata.zarr data.zarr", shell=True, check=True)
14 changes: 14 additions & 0 deletions spacem_helanih3t3/to_zarr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env python3

from pathlib import Path
import spatialdata as sd

# Dataset is already in SpatialData format.
path_read = Path(__file__).parent / "data.zarr"
assert path_read.exists()

print(f'view with "python -m napari_spatialdata view data.zarr"')

# Test reading
sdata = sd.SpatialData.read("./data.zarr")
print(sdata)
31 changes: 31 additions & 0 deletions spacem_scseahorse1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
### Data

This is a metabolomics dataset from single-cell Seahorse experiments on T-cells using the [SpaceM](https://doi.org/10.1038/s41592-021-01198-0) method, by [Alexandrov group, EMBL](https://www.embl.org/groups/alexandrov/).

The data consist of the following items:

- coordinate systems:
Each set of processed images/labels is registered in a corresponding coordinate system with matching prefix.
(This is because "global" is the default coordinate system of incoming unregistered data and is treated unmutable).
- images:
- `….pre_maldi`: Microscopy, with `Trans` and `GFP` channels
- `….post_maldi`: Microscopy after MALDI measurements, with `Trans` and `Dapi` channels
- labels:
- `….cells`: Segmentation of pre-MALDI images
- `….ablation_marks`: Segmentation of post-MALDI images
- shapes:
- `….layout`: Bounding boxes of wells on a slide
- `….maldi_regions`: Bounding boxes for the MALDI measurements
- tables:
- `table`:
- for all annotated elements: `project_id`, `slide_id`, `well_id`, `maldi_region_id`
- for segmentations:
- `object_type`, `replicate`, `treatment`
- scikit-image region properties
- `X`: MALDI ion intensities

### Download

The dataset is already natively in SpatialData 0.1.2 format.

Download the data with `download.py`, (`to_zarr.py` exists solely for consistency).
11 changes: 11 additions & 0 deletions spacem_scseahorse1/download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env python3
import os
import subprocess

URL = "https://s3.embl.de/spatialdata/raw_data/20220121_ScSeahorse1.small.zip"

os.chdir(os.path.dirname(__file__))
command = f"curl {URL} --output 'data.zip'"
subprocess.run(command, shell=True, check=True)
subprocess.run("unzip -o data.zip", shell=True, check=True)
subprocess.run("mv spatialdata.zarr data.zarr", shell=True, check=True)
14 changes: 14 additions & 0 deletions spacem_scseahorse1/to_zarr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env python3

from pathlib import Path
import spatialdata as sd

# Dataset is already in SpatialData format.
path_read = Path(__file__).parent / "data.zarr"
assert path_read.exists()

print(f'view with "python -m napari_spatialdata view data.zarr"')

# Test reading
sdata = sd.SpatialData.read("./data.zarr")
print(sdata)