Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General speedups for the pipeline; Fix functionality for export #90

Merged
merged 3 commits into from
Feb 5, 2025

Conversation

C-Loftus
Copy link
Member

@C-Loftus C-Loftus commented Feb 5, 2025

  • the prov graph and data graph are separate and thus don't need to have a dependency in the scheduler pipeline
    • the final export just exports the data graph not the prov graph so we don't need to wait on prov for exports
  • schedule clears the partition status when it is ran, that way the export won't be ran again until everything completes.
    • fairly sure this gives the behavior we want; namely that graphs are only generated whenever the schedule requests a new crawl; not if we just have a one off asset completion.
  • get rid of a compose container to make the buckets and just put this behavior directly in python
  • we can download the gleaner and nabu images in parallel with threading

@C-Loftus C-Loftus requested a review from webb-ben February 5, 2025 16:42
@C-Loftus C-Loftus merged commit 4fb73de into main Feb 5, 2025
4 checks passed
@C-Loftus C-Loftus deleted the generalSpeedups branch February 5, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants