Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPPARAM + on disk handling on images #88

Open
andrea-de-micheli opened this issue Jan 18, 2024 · 3 comments
Open

BPPARAM + on disk handling on images #88

andrea-de-micheli opened this issue Jan 18, 2024 · 3 comments

Comments

@andrea-de-micheli
Copy link

andrea-de-micheli commented Jan 18, 2024

Hello,

I've noticed that cytomapper::measureObjects doesn't execute with multiple workers when the images are stored on disk. Only one CPU core seems to be utilized despite running the following line:

sc_all = measureObjects(masks_disk, image = images_disk, img_id = "sample_id", BPPARAM = MulticoreParam(workers = 32))

This doesn't happen when images are referenced in memory -- multiple cores are used.

I have over 900 images and masks on disk as HDF5 files from which I would like to create a single sce object. What is the best course of action for this task?

Thank you!

@lassedochreden
Copy link
Collaborator

Hi @andrea-de-micheli,

to create a single SCE object from your images/masks, it would be more efficient to use the steinbock framework for pre-processing (described here: https://bodenmillergroup.github.io/steinbock/latest/ and also here: https://www.nature.com/articles/s41596-023-00881-0).

Afterwards, you can run read_steinbock() from the imcRtools package (which you can execute with multiple workers again via BPPARAM and is more performant than measureObjects). For more informatiom, please refer to ?read_steinbock() and the package vignette: https://bioconductor.org/packages/release/bioc/vignettes/imcRtools/inst/doc/imcRtools.html#3_Read_in_IMC_data

Anyhow, thanks for the heads up regarding measureObjects/BPPARAM issues with HDF5 files. I will have a closer look at this as well.

Feel free to close the issue, if this worked better for you.

Best,
Lasse

@andrea-de-micheli
Copy link
Author

andrea-de-micheli commented Jan 18, 2024

Thanks for your feedback Lasse.
I'm not using IMC images and have a custom pipeline for segmentation, and that is why I tried to stay away from Steinbock.
steinbock measure intensities runs on my data but sadly does not output something, maybe due to differences in file formats and directory structures. Hard to troubleshoot. Any other pointers?

@lassedochreden
Copy link
Collaborator

Hi,

steinbock should work on non-IMC images as long as the file formats match. You could check: https://bodenmillergroup.github.io/steinbock/latest/cli/preprocessing/#external-images - And if you run into troubles there, potentially open an issue for steinbock.

Regarding measureObjects - One option:

  1. Try to use loadImages with on_disk = FALSE to load images into memory (potentially for different subsets of the data and then merge to avoid potential memory issues) and then run measureObjects in a multicore fashion.

Will try to check the measureObjects/BPPARAM issues with HDF5 soon as well.

Best,
Lasse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants