Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cd: create a gha that generate an index for each query in the src/sql folder #5
cd: create a gha that generate an index for each query in the src/sql folder #5
Changes from all commits
df0c196
dc81122
9933808
67f1e30
fc64bfd
52954f1
b2a5d45
17c9e13
e002336
2503daa
985c9c8
3324ffe
0ba0527
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcfr if we want to build this for growth from the start, we cannot hard-code the file names. The generator code will now produce CSV/Parquet for each of the queries in the sql folder, but any changes to the SQL queries file names or addition of new queries will break the code at the moment.
It would make more sense to use wildcards to package all CSV/Parquet files depending on what is configured with the flags. Vamsi is going to give it a try to replace with the wildcard, unless you have other thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using wildcard as such will not work as the target check the timestamp of the output (aka the filename) to decide if it should "re-build" or not.
Once this is integrated, I suggest you create an issue describing the type, name and size of index files, and how these related to queries for generating them (as well as which domain they relates) (e.g microscopy, ...)
Let's also not that with working CI and CD pipeline refactoring will be straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line be instead
$<$<BOOL:IDC_INDEX_DATA_GENERATE_CSV_ARCHIVE>:--generate-csv-archive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I do not know if the boolean value for parquet is passed along to the index manager properly, as it seem to generate both csv and parquet although the latter is disabled in cmakelists.
This was from latest CD run:
https://github.com/ImagingDataCommons/idc-index-data/actions/runs/8425241913