Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate index updates and packaging #2

Open
fedorov opened this issue Nov 21, 2023 · 4 comments
Open

Automate index updates and packaging #2

fedorov opened this issue Nov 21, 2023 · 4 comments
Assignees

Comments

@fedorov
Copy link
Member

fedorov commented Nov 21, 2023

@vkt1414 I suggest we add 2 GitHub actions:

  1. Daily action that checks if there is an update to the IDC release BQ tables. If update is detected, it will
    1. run all queries in the queries folder, and save the result of each query as <query_prefix>.csv.zip in the "latest" release
    2. make a PR to update IDC version in https://github.com/ImagingDataCommons/idc-index/blob/main/idc_index/index.py#L65.
  2. Commit-triggered action that will look for release tags that follow our versioning pattern. When a tag release is detected, it will:
    1. create a GitHub release with the release tag
    2. attach indices from "latest" to the new release
    3. trigger PyPI package release
  3. Commit-triggered action that will
    1. run all the tests
    2. if queries are updated, re-run and update CSV in the latest release
  4. PR-triggered action that will
    1. if queries are updated, re-run the queries first and save resulting CSV in a place accessible during tests
    2. run all the tests (I think for this it will be beneficial to be able to run the test with the manually configured location of the table passed via the constructor)

What do you think? Did I miss anything?

@vkt1414
Copy link
Collaborator

vkt1414 commented Nov 28, 2023

re task 1: How should we address if a pull request is not attended in a day?

@fedorov
Copy link
Member Author

fedorov commented Nov 29, 2023

Good question! I think we should overwrite the branch corresponding to PR. Also, now that I think about it, the latest release and attachments should be commited only on merge, not when PR is created.

@fedorov
Copy link
Member Author

fedorov commented Feb 15, 2024

Based on thinking about this and discussions, here's the revisited proposed behavior of the GHA for facilitating index updates:

  1. Manual trigger only for now
  2. Take all of the queries in the queries folder and run them
  3. create artifacts containing the result for each query saved as CSV and Parquet, files should be named consistently with the query file name and include IDC version in the file name by figuring out what idc_current maps to at the time the query is executed
  4. to get the number of the latest version of IDC, list all of the dataset in bigquery-public-data project and get latest idc_v*
  5. create an issue that will include links to the artifacts generated by the GHA, title "[github-action] Index updates IDC v..." (something like that)
  6. replace idc_current with the actual version in the query and save each of the queries as GHA artifact

@jcfr jcfr transferred this issue from ImagingDataCommons/idc-index Mar 11, 2024
@fedorov
Copy link
Member Author

fedorov commented Mar 12, 2024

@vkt1414 we discussed this with JC, and with the new layout of the repositories, it makes sense to move queries from idc-index to this repo, and upload the resulting CSV/Parquet file to PyPI. We won't need to attach the zip file to the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants