From 3f94c46c2872bcf972f99510bf1a35408922a723 Mon Sep 17 00:00:00 2001 From: Damien Duportal Date: Thu, 24 Oct 2024 11:30:36 +0200 Subject: [PATCH] feat(blog) add a new post about the Update Center (5th brownout) Signed-off-by: Damien Duportal --- .../2024-10-24-update-center-brownouts-5.adoc | 64 +++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 content/blog/2024/10/24/2024-10-24-update-center-brownouts-5.adoc diff --git a/content/blog/2024/10/24/2024-10-24-update-center-brownouts-5.adoc b/content/blog/2024/10/24/2024-10-24-update-center-brownouts-5.adoc new file mode 100644 index 000000000000..6d7fe4293ceb --- /dev/null +++ b/content/blog/2024/10/24/2024-10-24-update-center-brownouts-5.adoc @@ -0,0 +1,64 @@ +--- +layout: post +title: "Brownout on Update Center (updates.jenkins.io):\n 24 and 25 October 2024" +tags: +- jenkins +- jenkins-infra +- update-center +authors: +- dduportal +opengraph: + image: /images/post-images/2023/01/12/jenkins-newsletter/infrastructure.png +overview: "We'll be using the new Update Center implementation in production for 1 day" +discourse: true +--- + +== Summary (link:https://en.wikipedia.org/wiki/Wikipedia:Too_long;_didn%27t_read[TL;DR]) + +Note: this is a follow up of link:/blog/2024/09/25/update-center-brownouts-4/[the 26 and 27 September 24 hours brownout]. + +The service https://updates.jenkins.io will switch its implementation to a new system during 1 day: + +- From Thursday 24 October 2024 from 10:00am UTC until Friday 25 October 2024 10:00am UTC + +All Jenkins users are impacted but should not see any functional change. + +⚠️ Please, check that your organization respects the advertised DNS TTL or you might be stuck in the brownout longer than expected. + +Under the hood, any HTTP request made to this service will be redirected to a mirror close to user locations during these brownouts. + +== What is the "Update Center"? + +Jenkins link:https://updates.jenkins.io[Update Center] is a web server at the core of the Jenkins public infrastructure which distributes the plugins, tool installers, and versions index to all Jenkins servers across the world. + +From the installation wizard to regular plugin updates, if you run Jenkins, then you use this service under the hood. + +Today, it serves around 50 Tb of data (outbound bandwidth) each month from a single virtual machine on AWS, which costs around $6,000 per month. + +In order to sustain this service and improve it, the Jenkins infrastructure team has worked relentlessly during the past years to have a new sustainable implementation for this service. + +The new Update Center implementation features a highly available system that redirects user requests to a download mirror close to their location. +Additional information is available in the link:https://github.com/jenkins-infra/helpdesk/issues/2649[GitHub issue]. + +== Why this Fifth "link:https://en.wikipedia.org/wiki/Brownout_(electricity)[Brownout]"? + +Our functional tests and performance tests are meeting our expectations after the initial work and the four brownouts we ran. + +However the previous (fourth) brownout raised 2 issues: +- We've seen (and were reported) HTTP/404 errors due to broken links in some HTML pages (not used a lot) and missing endpoints used for healthchecks in link:https://github.com/jenkins-infra/helpdesk/issues/4311[] +- We've also seen HTTP/404 errors after 8-9 hours or production usage due to network file share (SMB/CIFS) issues with Apache. The service is self-healing but we are trying to fine tune this: if it fails, then we'll have to update the Apache architecture to stop using a network file share. See https://github.com/jenkins-infra/helpdesk/issues/4312 for details. + +As such, a fifth brownout serves to verify the above (minor) problems are solved under a production workload. + +== How + +Both current and new Update Centers are updated at the same time and serve the same index. + +Starting 12 weeks ago, the Jenkins infrastructure has been using the new Update Center (link:https://azure.updates.jenkins.io) with a client-side DNS override (`updates.jenkins.io` hostname points to this new service). + +During this brownout, we'll simply switch the DNS entry `updates.jenkins.io` to this new service and watch for the logs and error rate. +At the end, we'll switch DNS back to the normal service and then analyze metrics and logs to see how the system behaved. + +We are confident the new system will perform as expected + +Please refer to the link:https://github.com/jenkins-infra/helpdesk/issues/2649[helpdesk ticket] for more information.