Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the manifest organization and data access #45

Open
fedorov opened this issue Jan 3, 2024 · 7 comments
Open

Simplify the manifest organization and data access #45

fedorov opened this issue Jan 3, 2024 · 7 comments

Comments

@fedorov
Copy link
Member

fedorov commented Jan 3, 2024

As discussed today, we should be able to parameterize execution of the workflow with just the list of SeriesInstanceUIDs and the version of IDC that should be used. We should also switch to using idc-index for resolving to series URLs and downloading the files.

This is dependent on update of idc-index to IDC v17, and other updates which I am working on in ImagingDataCommons/idc-index#23.

@vkt1414
Copy link
Collaborator

vkt1414 commented Jan 4, 2024

I was able to pass the list of SeriesInstanceUIDs to papermill and to use idc-index. But as I imagine how these lists are passed on from Terra and Seven Bridges (SB-CGC), I realized that batching in SB-CGC is possible only by files. i.e previously I was passing the csv manifests to papermill on both Terra and SB-CGC. Now with the lists, we cannot run SB-CGC in batches without csv manifests..

one way to make this work is, we continue to use manifests for SB-CGC, but the papermill commands for Terra and SB-CGC will be different if we use list of SeriesInstanceUIDs

what do you think?

image

@fedorov
Copy link
Member Author

fedorov commented Jan 4, 2024

I am not sure I understand - I thought we keep using CSV, just change its content. I thought about this issue as about simplifying manifest - not getting rid of it.

@vkt1414
Copy link
Collaborator

vkt1414 commented Jan 4, 2024

I see. Getting rid of it makes sense for Terra users as they can simply import the sample data table we will provide. But with csv manifests..they will need to update the location of the csv manifests in the data table to get started.

this is all that's needed for Terra users
image

@fedorov
Copy link
Member Author

fedorov commented Jan 4, 2024

I don't have the experience to weigh in, I am afraid .... But we should make sure IDC version is included as a column!

@vkt1414
Copy link
Collaborator

vkt1414 commented Jan 4, 2024

I forgot about the idc-version. I will add the column.
I think passing the list of seriesInstanceUIDs directly from the data table is the cleanest and most simplest approach for Terra. So if it is okay with you to use different papermill commands on Terra and SB-CGC, I'll use this approach. Otherwise I can stick to the csv manifest approach.

@fedorov
Copy link
Member Author

fedorov commented Jan 8, 2024

So if it is okay with you to use different papermill commands on Terra and SB-CGC

Yes, it's ok - at least for now

@vkt1414
Copy link
Collaborator

vkt1414 commented Jan 8, 2024

I realized I could use the same papermill command but use an additional step for SB-CGC, which is just reading the manifest and passing the list of seriesInstanceUIDs to the papermill command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants