Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spikes in the duration of bqetl_merino_newtab_extract_to_gcs #6202

Open
data-sync-user opened this issue Sep 13, 2024 · 0 comments
Open

Spikes in the duration of bqetl_merino_newtab_extract_to_gcs #6202

data-sync-user opened this issue Sep 13, 2024 · 0 comments

Comments

@data-sync-user
Copy link
Collaborator

data-sync-user commented Sep 13, 2024

Context

The Airflow job bqetl_merino_newtab_extract_to_gcs is scheduled to run every 20 minutes. Runs normally take less than 5 minutes. This job aggregates engagement for recommendations on New Tab, such that Merino can show high engaging items to more people. We would like this engagement data to have a low (~30 minute) delay to provide the most engaging stories on New Tab.

Issue

Over the last two days, duration has spiked at:

In both runs, all three tasks individually had a short duration, and the long run duration was caused by a delay in queueing up tasks after the first one ended.

!image-20240913-175622.png|width=100%,alt="image-20240913-175622.png"!

Initial investigation

Cluster activity on Sept 12th from 3:30am - 4:30am UTC shows only 22 tasks instances, of which 12 were skipped. In contrast, the subsequent 60 minutes had 179 task instances of which 12 were skipped.

This suggests that there was a performance issue that impacted more jobs than just the above one. Airflow's scheduler might have had some performance issues, or might be hitting a limit?

!image-20240913-182332.png|width=100%,alt="image-20240913-182332.png"!

!image-20240913-182359.png|width=100%,alt="image-20240913-182359.png"!

┆Issue is synchronized with this Jira Bug
┆Attachments: image-20240913-175622.png | image-20240913-182332.png | image-20240913-182359.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant