Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure VM outage #4287

Closed
dduportal opened this issue Sep 10, 2024 · 2 comments
Closed

Azure VM outage #4287

dduportal opened this issue Sep 10, 2024 · 2 comments

Comments

@dduportal
Copy link
Contributor

dduportal commented Sep 10, 2024

Service(s)

cert.ci.jenkins.io, ci.jenkins.io, infra.ci.jenkins.io, trusted.ci.jenkins.io

Summary

Since Monday 9 September 2024 at 03:00pm UTC, we see an increased rate of failure to spin up Azure VM agents on ci.jenkins.io.

  • Nothing in the Azure status site 🤦
  • The Azure portal shows, in the "Azure Monitor" -> "Service Health" section, that we are subject to 2 incidents

Capture d’écran 2024-09-09 à 17 33 17

Capture d’écran 2024-09-09 à 17 33 41

🎯 Impact is on the Azure VM agents and sometimes on the Linux container agents when scaling up: builds looks slow because they are stuck waiting for agents to spin up

@dduportal
Copy link
Contributor Author

Update:

  • Impact on today's weekl release 2.476
    • trusted.ci is set up to retain its agents for 60 min (to reuse them) to accelerate build (puppet is disabled)
  • Opened incident on status.jenkins.io - open Azure outage status#539
  • Added a message on ci.jenkins.io:

Capture d’écran 2024-09-10 à 19 37 13

@dduportal
Copy link
Contributor Author

Problem is now gone as Azure teams added capacity and software fixes the 12 and 13 September

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant