Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEAB-6097: Nightly CLI build failure should notify in Slack #284

Merged
merged 7 commits into from
Apr 10, 2024

Conversation

ll5zh
Copy link
Contributor

@ll5zh ll5zh commented Apr 1, 2024

Description
This PR restores the Slack notification for nightly CLI build failures, which was removed after moving the nightly cron to CircleCI (#268). Notifications are set up appear in #dockstore-dev-alerts.

Review Instructions
In config.yml, verify that the Slack notification is configured as expected. See https://oicr.slack.com/archives/C0151UDNR9R/p1711479933594839 for a successful test run of the slack/notify setup (without the nightly condition).

Since the CLI nightly workflow consists of multiple jobs, the PR currently proposes separate Slack notifications for the build, unit tests, and integration tests. Are there any jobs that should be added/removed?

Note: I originally created a new Slack app called "Dockstore CLI Build", with its own CircleCI context called oicr-slack-cli, to do the notifying (test run: https://oicr.slack.com/archives/C0151UDNR9R/p1711465118013749). But it looks like the notification also works with our pre-existing "dockstore build notifications" app and oicr-slack context. I decided to stick with the old app for consistency, but do comment if you'd prefer otherwise.

Issue
SEAB-6097

Security
If there are any concerns that require extra attention from the security team, highlight them here.

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running ./mvnw clean install
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

@ll5zh ll5zh self-assigned this Apr 1, 2024
Copy link

codecov bot commented Apr 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.09%. Comparing base (7cb07a8) to head (796e718).
Report is 1 commits behind head on develop.

Additional details and impacted files
@@              Coverage Diff              @@
##             develop     #284      +/-   ##
=============================================
- Coverage      70.34%   70.09%   -0.25%     
+ Complexity      1077     1058      -19     
=============================================
  Files             47       47              
  Lines           6069     6069              
  Branches         801      801              
=============================================
- Hits            4269     4254      -15     
- Misses          1466     1469       +3     
- Partials         334      346      +12     
Flag Coverage Δ
bitbuckettests 9.83% <ø> (ø)
confidentialtooltests 55.70% <ø> (-0.27%) ⬇️
confidentialworkflowtests 30.28% <ø> (ø)
nonconfidentialtests 32.39% <ø> (-0.02%) ⬇️
singularitytests 16.59% <ø> (ø)
unittests 8.35% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ll5zh ll5zh marked this pull request as ready for review April 2, 2024 13:24
Comment on lines 144 to 145
- when:
condition: << pipeline.parameters.run_against_develop_core >>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting moving the when condition into notify-slack so you don't have to specify it multiple times

@@ -43,17 +44,23 @@ common_jobs: &common_jobs
<<: *common_filters
requires:
- find-develop-snapshot
context:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the CLI nightly workflow consists of multiple jobs, the PR currently proposes separate Slack notifications for the build, unit tests, and integration tests. Are there any jobs that should be added/removed?

GitHub won't let me comment on the lines I want, but is it possible to call notify-slack within common_jobs, after the build job? So the slack notification will alert us if the nightly workflow fails as a whole and we don't have to add the notify-slack command to multiple jobs

common_jobs: &common_jobs
jobs:
# find-develop-snapshot job is run for both "everything" and "nightly" builds even though it's only needed for nightly.
# See https://github.com/dockstore/dockstore-cli/pull/268#discussion_r1419429151
- find-develop-snapshot
- build:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've seen, I don't think the slack/notify command can assess the cumulative status of multiple jobs in a workflow (see "Is it possible to receive only the final status of the Workflow?" in https://github.com/CircleCI-Public/slack-orb/wiki/FAQ).

Seeing if we can integrate Slack via webhook instead of the orb -- it looks like CircleCI has a workflow-completed event (https://circleci.com/docs/webhooks/#event-specifications) which could satisfy the suggested feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, I wasn't sure if it was possible so you can ignore that feedback 🙂

Is it possible to add notify-slack to setup_and_run_integration_tests? It's the only command called by non-confidential-tests, confidential-workflow-tests, confidential-tool-tests, and integration-tests. That way you don't have to add it to each integration test job

setup_and_run_integration_tests:
steps:
- setup_for_integration_tests
- restore_bash_env
- run_tests
- save_test_results
- send_coverage
- persist_coverage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would work, since notify-slack would be a step within a job 👍 It would prevent repetition in the config file, but I guess we'd still have separate notifications for each job (as opposed to one workflow notification).

Will give it a try first, so we can decide how we feel about the multiple notifications

Copy link
Member

@denis-yuen denis-yuen Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem a bit excessive to report with separate Slack notifications for the build, unit tests, and integration tests.
Could consider notifying after sonar cloud which should only pass if all the other builds pass. (I don't feel strongly about this one, but the number of notifications we get can be somewhat overwhelming)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible concern with only notifying in sonar cloud (if I'm understanding CircleCI correctly): if the build fails, the unit/integration tests (which require the build) won't run at all, and then sonar cloud (which requires those tests) won't run also? So we won't be notified at all.

To be fair, if we're only notifying on failures, the notifications shouldn't be a daily occurrence anyways

Copy link
Member

@denis-yuen denis-yuen Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. So you would be able to notify on success, but not on failure. 💭

To be fair, if we're only notifying on failures, the notifications shouldn't be a daily occurrence anyways

Ah good point, if this PR is notifying only on failure that would be a massive improvement in the first place.

Copy link
Contributor

@coverbeck coverbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Kathy's feedback, nothing else to add. Please re-request review when you've responded to it, thanks.

- notifies slack for build, unit tests, and all integration tests (separately)
@@ -536,6 +553,7 @@ commands:
- save_test_results
- send_coverage
- persist_coverage
- notify-slack
Copy link
Contributor Author

@ll5zh ll5zh Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing purposes, this is what the Slack notifications looked like when notified separately on each job's success: https://oicr.slack.com/archives/C0151UDNR9R/p1712088975956109

Things to consider: (re-requesting reviews for discussion on this)

  • If we choose to notify Slack only on failed jobs, these messages probably won't occur too frequently; nightly runs will probably succeed most of the time
  • I don't think it'd be possible to get failed notifications from the build, unit tests, and integration tests all at once. If the build fails/notifies, for example, the dependent jobs (i.e. all the tests) should be skipped and can't notify. Worst case scenario is if the build passes, and all the unit/integration tests fail?
  • Also just noticed that some integration tests produce multiple notifications (e.g. 3 for confidential-tool/workflow-tests, 2 for non-confidential-tests); appears to be due to the parallelism

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If we choose to notify Slack only on failed jobs, these messages probably won't occur too frequently; nightly runs will probably succeed most of the time

I think notifying on failed jobs is better since those are the actionable ones. That said, we can run into the problem that it the notifications break ...

Especially with the parallel notifications,

some integration tests produce multiple notifications

I think there are too many notifications.

Worst case scenario is if the build passes, and all the unit/integration tests fail?

If I understand correctly, we'd get messages for all the failures if we were notifying on failures only.
I think that's ok. I think numerous failure messages in a worse case scenario is better than numerous success messages (which may lead us to ignore messages)

Open to ideas

@ll5zh ll5zh requested review from coverbeck and kathy-t April 5, 2024 20:58
Copy link
Contributor

@coverbeck coverbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. So you would be able to notify on success, but not on failure. 💭

To be fair, if we're only notifying on failures, the notifications shouldn't be a daily occurrence anyways

Ah good point, if this PR is notifying only on failure that would be a massive improvement in the first place.

Agreed it would be very nice if we only got one failure notification, but I don't see a way to do it.

@@ -6,9 +6,13 @@ parameters:
run_against_develop_core:
type: boolean
default: false
test_slack: # added temporarily for testing purposes
type: boolean
default: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be here still?

channel: $slack_id
event: fail
template: basic_fail_1
- slack/notify: # added temporarily for testing purposes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, should this be here?

Copy link

sonarcloud bot commented Apr 8, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@ll5zh ll5zh merged commit b2835c5 into develop Apr 10, 2024
17 checks passed
@ll5zh ll5zh deleted the seab-6097/cli-slack branch April 10, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants