Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo scripts to use iohub with SLURM job arrays #44

Closed
wants to merge 2 commits into from

Conversation

ziw-liu
Copy link
Contributor

@ziw-liu ziw-liu commented Jun 8, 2023

POC of czbiohub-sf/iohub#143 with a cropped-down mantis dataset.

The input HCS dataset is scattered into individual FOVs, shipped to nodes in an array job, and written to temporary storage where the arrays are filled with the task IDs. The temporary FOVs are then gathered into an HCS store. Since the relative paths are kept unchanged in this process, the output HCS store has the same plate metadata as the input.

Copy link
Contributor

@edyoshikun edyoshikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the code @ziw-liu .

Let's iterate on this, but this is a good start. I know this is just a demo script, but I am looking for feedback and also asking to improve the demo script with the following:

The current script expects the data.zarr as input where I think ideally we would want to call these scripts with data.zarr/*/*/* with wildcards. From an iteration with @talonchandler probably what would be useful would be something like this:

#!/bin/bash

POSPATHS=(/hpc/instruments/cm.mantis/2023_05_10_PCNA_RAC1/timelapse_2_3/timelapse_2_lightsheet_1.zarr/*/*/*)
echo "pospaths ${POSPATHS[0]}"

# Create an array to store the last three directories for each path
OUT_FOV_DIR=()

# Extract the last three directories for each path
for path in "${POSPATHS[@]}"; do
  IFS='/' read -ra OUT_FOV_DIR <<< "$path"
  OUT_FOV_DIR+=("${OUT_FOV_DIR[-3]}/${OUT_FOV_DIR[-2]}/${OUT_FOV_DIR[-1]}")
done

# Print the array elements
printf '%s\n' "${OUT_FOV_DIR[@]}"

#OUT_FOV_DIR[0]=0/0/0
#OUT_FOV_DIR[1]=0/1/0

@edyoshikun
Copy link
Contributor

The other thing to keep in mind is that this script as @ziw-liu mentions is that the approach of not creating the zarr store beforehand only works for smaller datasets.

@talonchandler talonchandler added bug Something isn't working and removed bug Something isn't working labels Jun 9, 2023
@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jun 9, 2023

@edyoshikun I intentionally kept these scripts minimal to best serve as a POC of the iohub PR and allow us to decide the way going forward quickly on the core issue (metadata race condition). The extra niceties are great for a real template of mantis processing in a future PR, but not necessarily relevant to the purpose here. Also this PR is not meant to be merged, similar to mehta-lab/recOrder#374.

@talonchandler
Copy link
Contributor

@edyoshikun

The other thing to keep in mind is that this script as @ziw-liu mentions is that the approach of not creating the zarr store beforehand only works for smaller datasets.

Can you expand on this? I thought the limitation was the chunk size, not the dataset size?

@ziw-liu I just ran this example and I think it's a step in the right direction, assuming that we don't have a dataset size limitation that @edyoshikun is mentioning.

Can I request the following extensions to this example:

  • use multiple FOVs and multiple wells. Many of the nit-picky issues I'm running into with metadata handling pop up when I try to handle this case. I expect you'll need to incorporate the snippet that @edyoshikun posted above
  • use >10 total wells---another set of issues arises in this case. In my 30-well SLURM examples that I've trialled with my naive approaches, the metadata order is scrambled (or in sorted order instead of natsorted order), which makes visualization with napari difficult.

I think that we should shoot for a set of SLURM example scripts that merges into the mantis/examples folder and serves our use cases. I'm all for iterating with POCs, but ultimately we want a set of scripts that we can copy into an analysis folder, make minor modifications, then run.

@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jun 9, 2023

Can you expand on this? I thought the limitation was the chunk size, not the dataset size?

This is just a simplification in the toy python processor here. It would be the same to do create_zeros first.

@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jun 9, 2023

  • In my 30-well SLURM examples that I've trialled with my naive approaches, the metadata order is scrambled (or in sorted order instead of natsorted order), which makes visualization with napari difficult.

This is because if you don't specify row_index and col_index in create_position, they will default to the sequence you added them, and in the case of an slurm array will be random. The solution is to explicitly supply those arguments.

@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jun 9, 2023

I'm all for iterating with POCs, but ultimately we want a set of scripts that we can copy into an analysis folder, make minor modifications, then run.

Yes this is the final goal. But I'd like to decide between this and #46 before we invest more effort into quality-of-life features.

@talonchandler talonchandler mentioned this pull request Jun 12, 2023
@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jun 20, 2023

Closing in favor of #47

@ziw-liu ziw-liu closed this Jun 20, 2023
@ieivanov ieivanov deleted the demo-slurm-iohub branch August 19, 2023 00:39
@ieivanov ieivanov restored the demo-slurm-iohub branch August 19, 2023 00:40
@ieivanov ieivanov deleted the demo-slurm-iohub branch August 19, 2023 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants