Demo scripts to use iohub with SLURM job arrays #44

ziw-liu · 2023-06-08T20:35:16Z

POC of czbiohub-sf/iohub#143 with a cropped-down mantis dataset.

The input HCS dataset is scattered into individual FOVs, shipped to nodes in an array job, and written to temporary storage where the arrays are filled with the task IDs. The temporary FOVs are then gathered into an HCS store. Since the relative paths are kept unchanged in this process, the output HCS store has the same plate metadata as the input.

Signed-off-by: Ziwen Liu <[email protected]>

edyoshikun

Thanks for the code @ziw-liu .

Let's iterate on this, but this is a good start. I know this is just a demo script, but I am looking for feedback and also asking to improve the demo script with the following:

The current script expects the data.zarr as input where I think ideally we would want to call these scripts with data.zarr/*/*/* with wildcards. From an iteration with @talonchandler probably what would be useful would be something like this:

#!/bin/bash

POSPATHS=(/hpc/instruments/cm.mantis/2023_05_10_PCNA_RAC1/timelapse_2_3/timelapse_2_lightsheet_1.zarr/*/*/*)
echo "pospaths ${POSPATHS[0]}"

# Create an array to store the last three directories for each path
OUT_FOV_DIR=()

# Extract the last three directories for each path
for path in "${POSPATHS[@]}"; do
  IFS='/' read -ra OUT_FOV_DIR <<< "$path"
  OUT_FOV_DIR+=("${OUT_FOV_DIR[-3]}/${OUT_FOV_DIR[-2]}/${OUT_FOV_DIR[-1]}")
done

# Print the array elements
printf '%s\n' "${OUT_FOV_DIR[@]}"

#OUT_FOV_DIR[0]=0/0/0
#OUT_FOV_DIR[1]=0/1/0

edyoshikun · 2023-06-08T23:40:10Z

The other thing to keep in mind is that this script as @ziw-liu mentions is that the approach of not creating the zarr store beforehand only works for smaller datasets.

ziw-liu · 2023-06-09T16:03:43Z

@edyoshikun I intentionally kept these scripts minimal to best serve as a POC of the iohub PR and allow us to decide the way going forward quickly on the core issue (metadata race condition). The extra niceties are great for a real template of mantis processing in a future PR, but not necessarily relevant to the purpose here. Also this PR is not meant to be merged, similar to mehta-lab/recOrder#374.

talonchandler · 2023-06-09T16:52:43Z

@edyoshikun

The other thing to keep in mind is that this script as @ziw-liu mentions is that the approach of not creating the zarr store beforehand only works for smaller datasets.

Can you expand on this? I thought the limitation was the chunk size, not the dataset size?

@ziw-liu I just ran this example and I think it's a step in the right direction, assuming that we don't have a dataset size limitation that @edyoshikun is mentioning.

Can I request the following extensions to this example:

use multiple FOVs and multiple wells. Many of the nit-picky issues I'm running into with metadata handling pop up when I try to handle this case. I expect you'll need to incorporate the snippet that @edyoshikun posted above
use >10 total wells---another set of issues arises in this case. In my 30-well SLURM examples that I've trialled with my naive approaches, the metadata order is scrambled (or in sorted order instead of natsorted order), which makes visualization with napari difficult.

I think that we should shoot for a set of SLURM example scripts that merges into the mantis/examples folder and serves our use cases. I'm all for iterating with POCs, but ultimately we want a set of scripts that we can copy into an analysis folder, make minor modifications, then run.

ziw-liu · 2023-06-09T16:56:24Z

Can you expand on this? I thought the limitation was the chunk size, not the dataset size?

This is just a simplification in the toy python processor here. It would be the same to do create_zeros first.

ziw-liu · 2023-06-09T17:00:27Z

In my 30-well SLURM examples that I've trialled with my naive approaches, the metadata order is scrambled (or in sorted order instead of natsorted order), which makes visualization with napari difficult.

This is because if you don't specify row_index and col_index in create_position, they will default to the sequence you added them, and in the case of an slurm array will be random. The solution is to explicitly supply those arguments.

ziw-liu · 2023-06-09T17:04:31Z

I'm all for iterating with POCs, but ultimately we want a set of scripts that we can copy into an analysis folder, make minor modifications, then run.

Yes this is the final goal. But I'd like to decide between this and #46 before we invest more effort into quality-of-life features.

ziw-liu · 2023-06-20T23:00:35Z

Closing in favor of #47

demo scripts to use iohub with slurm job arrays

ed16e8f

Signed-off-by: Ziwen Liu <[email protected]>

ziw-liu requested review from talonchandler and edyoshikun June 8, 2023 20:35

adding some folder creation for output and logs.

53fab16

edyoshikun reviewed Jun 8, 2023

View reviewed changes

talonchandler added bug Something isn't working and removed bug Something isn't working labels Jun 9, 2023

talonchandler mentioned this pull request Jun 12, 2023

create-then-fill #46

Closed

ziw-liu closed this Jun 20, 2023

ieivanov deleted the demo-slurm-iohub branch August 19, 2023 00:39

ieivanov restored the demo-slurm-iohub branch August 19, 2023 00:40

ieivanov deleted the demo-slurm-iohub branch August 19, 2023 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo scripts to use iohub with SLURM job arrays #44

Demo scripts to use iohub with SLURM job arrays #44

ziw-liu commented Jun 8, 2023

edyoshikun left a comment

edyoshikun commented Jun 8, 2023

ziw-liu commented Jun 9, 2023

talonchandler commented Jun 9, 2023

ziw-liu commented Jun 9, 2023

ziw-liu commented Jun 9, 2023

ziw-liu commented Jun 9, 2023

ziw-liu commented Jun 20, 2023

Demo scripts to use iohub with SLURM job arrays #44

Demo scripts to use iohub with SLURM job arrays #44

Conversation

ziw-liu commented Jun 8, 2023

edyoshikun left a comment

Choose a reason for hiding this comment

edyoshikun commented Jun 8, 2023

ziw-liu commented Jun 9, 2023

talonchandler commented Jun 9, 2023

ziw-liu commented Jun 9, 2023

ziw-liu commented Jun 9, 2023

ziw-liu commented Jun 9, 2023

ziw-liu commented Jun 20, 2023