Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust for crc2 dataset #204

Merged
merged 7 commits into from
Sep 9, 2024
Merged

adjust for crc2 dataset #204

merged 7 commits into from
Sep 9, 2024

Conversation

melonora
Copy link
Collaborator

@melonora melonora commented Sep 8, 2024

When reading in the data from https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression/dataset-human-crc a few problems were encountered. First, the lowres, hires etc images were located in a directory with as root level the dataset_id prefix followed by spatial which then contained the spatial directory. Furthermore the naming of the fullres image was different. This PR accounts for that.

@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2024

Codecov Report

Attention: Patch coverage is 10.71429% with 25 lines in your changes missing coverage. Please review.

Project coverage is 45.50%. Comparing base (54da345) to head (fa77f6a).
Report is 181 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata_io/readers/visium_hd.py 0.00% 23 Missing ⚠️
src/spatialdata_io/readers/_utils/_utils.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #204      +/-   ##
==========================================
+ Coverage   35.91%   45.50%   +9.58%     
==========================================
  Files          19       22       +3     
  Lines        1715     2147     +432     
==========================================
+ Hits          616      977     +361     
- Misses       1099     1170      +71     
Files with missing lines Coverage Δ
src/spatialdata_io/_constants/_constants.py 100.00% <100.00%> (ø)
src/spatialdata_io/readers/_utils/_utils.py 42.62% <0.00%> (+15.53%) ⬆️
src/spatialdata_io/readers/visium_hd.py 18.78% <0.00%> (-2.96%) ⬇️

... and 4 files with indirect coverage changes

load_image(
path=path / VisiumHDKeys.IMAGE_HIRES_FILE,
path=(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make this check when constructing the hires_image_path (same for the other images) and add the path in a intermediate variable.

Something like:

root_level_spatial_path = path / VisiumHDKeys.SPATIAL
# if doens't exist
root_level_spatial_path = path / libary_id / VisiumHDKeys.SPATIAL
# if still doens't exist:
# raise exception

And then please remove spatial/ from VisiumHDKeys.IMAGE_HIRES_FILE, ...

In this way the code will be more robust. At the moment spatial appears as a keyword in line 299, and it's hidden in VisiumHDKeys.IMAGE_HIRES_FILE otherwise.

path=(
hires_image_path
if hires_image_path.exists()
else path / f"{filename_prefix}spatial" / VisiumHDKeys.IMAGE_HIRES_FILE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general let's try to have dataset specific keywords, like spatial outside VisiumHDKeys otherwise typos could lead to subtle problems.

fullres_image_filename = fullres_image_filenames[0]
fullres_image_file = path_fullres / fullres_image_filename
fullres_image_paths = [path_fullres / image_filename for image_filename in fullres_image_filenames]
elif list((path_fullres := (path / f"{filename_prefix}tissue_image")).parent.glob(f"{path_fullres.name}.*")):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as below, please use dataset specific keywords only in VisiumHDKeys.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one I did not change VisiumHDkeys. The thing here is that these images live within the top level directory where in the current code a microscope_image directory was assumed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words this code already follows sort of your comment of first try this, then that and otherwise raise exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see with the mouse intestine dataset, the actual fullres image is in the top directory as well. Is the microscope_image directory valid?

@LucaMarconato
Copy link
Member

LucaMarconato commented Sep 8, 2024

Thanks for the PR, I have added some comments.

Please after the edits, please verify that:

  • the reader works on the visium hd data in the sandbox (mouse small intestine)
  • the reader works on the crc2 dataset

@LucaMarconato
Copy link
Member

Nice! After verifying the code on the crc2 datasrt, it is good to merge!

@melonora
Copy link
Collaborator Author

melonora commented Sep 9, 2024

Nice! After verifying the code on the crc2 datasrt, it is good to merge!

I added one part to make it more robust with shapes, it can now deal with any level of nestedness as long as binned_outputs is somewhere.

@melonora melonora merged commit 853d3e8 into scverse:main Sep 9, 2024
5 checks passed
@melonora melonora deleted the fix_visiumhd branch September 9, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants