Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to start an instance in the cloud creates undeletable zombie nodes #471

Open
Theoderich opened this issue Sep 8, 2024 · 0 comments · May be fixed by #470
Open

Failing to start an instance in the cloud creates undeletable zombie nodes #471

Theoderich opened this issue Sep 8, 2024 · 0 comments · May be fixed by #470
Labels
bug Something isn't working

Comments

@Theoderich
Copy link

Jenkins and plugins versions report

Environment Jenkins: 2.462.1 OS: Linux - 6.1.0-23-cloud-amd64 Java: 17.0.12 - Debian (OpenJDK 64-Bit Server VM) --- ansicolor:1.0.4 antisamy-markup-formatter:162.v0e6ec0fcfcf6 apache-httpcomponents-client-4-api:4.5.14-208.v438351942757 asm-api:9.7-33.v4d23ef79fcc8 authentication-tokens:1.119.v50285141b_7e1 basic-branch-build-strategies:81.v05e333931c7d blueocean:1.27.14 blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.27.14 blueocean-commons:1.27.14 blueocean-config:1.27.14 blueocean-core-js:1.27.14 blueocean-dashboard:1.27.14 blueocean-display-url:2.4.3 blueocean-events:1.27.14 blueocean-git-pipeline:1.27.14 blueocean-github-pipeline:1.27.14 blueocean-i18n:1.27.14 blueocean-jwt:1.27.14 blueocean-personalization:1.27.14 blueocean-pipeline-api-impl:1.27.14 blueocean-pipeline-editor:1.27.14 blueocean-pipeline-scm-api:1.27.14 blueocean-rest:1.27.14 blueocean-rest-impl:1.27.14 blueocean-web:1.27.14 bootstrap5-api:5.3.3-1 bouncycastle-api:2.30.1.78.1-248.ve27176eb_46cb_ branch-api:2.1178.v969d9eb_c728e caffeine-api:3.1.8-133.v17b_1ff2e0599 checks-api:2.2.0 cloudbees-bitbucket-branch-source:888.v8e6d479a_1730 cloudbees-folder:6.928.v7c780211d66e command-launcher:115.vd8b_301cc15d0 commons-lang3-api:3.14.0-76.vda_5591261cfe commons-text-api:1.12.0-129.v99a_50df237f7 config-file-provider:973.vb_a_80ecb_9a_4d0 credentials:1371.vfee6b_095f0a_3 credentials-binding:681.vf91669a_32e45 data-tables-api:2.0.8-1 discard-old-build:1.07 display-url-api:2.204.vf6fddd8a_8b_e9 docker-commons:443.v921729d5611d docker-workflow:580.vc0c340686b_54 durable-task:568.v8fb_5c57e8417 echarts-api:5.5.0-1 eddsa-api:0.3.0-4.v84c6f0f4969e favorite:2.221.v19ca_666b_62f5 font-awesome-api:6.5.2-1 gatling:1.3.0 git:5.3.0 git-client:5.0.0 git-parameter:0.9.19 git-server:126.v0d945d8d2b_39 github:1.40.0 github-api:1.321-468.v6a_9f5f2d5a_7e github-branch-source:1793.v1831e9c68d77 gitlab-plugin:1.8.1 google-compute-engine:4.575.v6969b_7c435eb_ google-login:109.v022b_cf87b_e5b_ google-metadata-plugin:0.5 google-oauth-plugin:1.330.vf5e86021cb_ec google-storage-plugin:1.360.v6ca_38618b_41f gson-api:2.11.0-41.v019fcf6125dc h2-api:11.1.4.199-30.v1c64e772f3a_c handy-uri-templates-2-api:2.1.8-30.v7e777411b_148 htmlpublisher:1.36 instance-identity:185.v303dc7c645f9 ionicons-api:74.v93d5eb_813d5f jackson2-api:2.17.0-379.v02de8ec9f64c jakarta-activation-api:2.1.3-1 jakarta-mail-api:2.1.3-1 javadoc:280.v050b_5c849f69 javax-activation-api:1.2.0-7 javax-mail-api:1.6.2-10 jaxb:2.3.9-1 jdk-tool:80.v8a_dee33ed6f0 jenkins-design-language:1.27.14 jersey2-api:2.44-151.v6df377fff741 jira:3.13 jjwt-api:0.11.5-112.ve82dfb_224b_a_d joda-time-api:2.12.7-29.v5a_b_e3a_82269a_ jquery:1.12.4-1 jquery3-api:3.7.1-2 jsch:0.2.16-86.v42e010d9484b_ json-api:20240303-41.v94e11e6de726 json-path-api:2.9.0-58.v62e3e85b_a_655 junit:1284.vf75d778f98c5 ldap:725.v3cb_b_711b_1a_ef locale:511.v212370760160 lockable-resources:1255.vf48745da_35d0 mailer:472.vf7c289a_4b_420 matrix-auth:3.2.2 matrix-project:832.va_66e270d2946 maven-plugin:3.23 mina-sshd-api-common:2.13.1-117.v2f1a_b_66ff91d mina-sshd-api-core:2.13.1-117.v2f1a_b_66ff91d oauth-credentials:0.653.v14cf2088e950 okhttp-api:4.11.0-172.vda_da_1feeb_c6e pipeline-build-step:540.vb_e8849e1a_b_d8 pipeline-graph-analysis:216.vfd8b_ece330ca_ pipeline-groovy-lib:727.ve832a_9244dfa_ pipeline-input-step:495.ve9c153f6067b_ pipeline-maven:1421.v610fa_b_e2d60e pipeline-maven-api:1421.v610fa_b_e2d60e pipeline-milestone-step:119.vdfdc43fc3b_9a_ pipeline-model-api:2.2205.vc9522a_9d5711 pipeline-model-definition:2.2205.vc9522a_9d5711 pipeline-model-extensions:2.2205.vc9522a_9d5711 pipeline-rest-api:2.34 pipeline-stage-step:312.v8cd10304c27a_ pipeline-stage-tags-metadata:2.2205.vc9522a_9d5711 pipeline-stage-view:2.34 plain-credentials:183.va_de8f1dd5a_2b_ plugin-util-api:4.1.0 pubsub-light:1.18 purge-job-history:1.6 role-strategy:743.v142ea_b_d5f1d3 scm-api:696.v778d637b_a_762 scmskip:50.vfb_3a_f04242a_a_ script-security:1341.va_2819b_414686 slack:734.v7f9ec8b_66975 snakeyaml-api:2.2-121.v5a_68b_9300b_d4 sse-gateway:1.27 ssh-credentials:343.v884f71d78167 sshd:3.330.vc866a_8389b_58 structs:338.v848422169819 test-results-analyzer:0.4.1 throttle-concurrents:2.14 token-macro:400.v35420b_922dcb_ trilead-api:2.147.vb_73cc728a_32e variant:60.v7290fc0eb_b_cd workflow-aggregator:600.vb_57cdd26fdd7 workflow-api:1332.vc21122317a_8e workflow-basic-steps:1058.vcb_fc1e3a_21a_9 workflow-cps:3922.va_f73b_7c4246b_ workflow-durable-task-step:1364.v2fd76fb_6fd41 workflow-job:1400.v7fd111b_ec82f workflow-multibranch:783.787.v50539468395f workflow-scm-step:427.v4ca_6512e7df1 workflow-step-api:678.v3ee58b_469476 workflow-support:920.v59f71ce16f04

What Operating System are you using (both controller, and any agents involved in the problem)?

Debian 12 on the Controller and centos-7 on agent

Reproduction steps

  1. Make sure your glcoud limits are exceed in such a way that no new agents can be started
  2. Start a build that creates one-shot agents in the gcloud
  3. Agent fails to start due to exceeded limits

Relevant log:

Sep 02 17:54:56 gcp-dtag-sec-jenkins jenkins[519]: 2024-09-02 15:54:56.069+0000 [id=578213]        INFO        c.g.j.p.c.ComputeEngineComputerLauncher#launch: Launch failed while waiting for operation operation-1725275364327-62120f93aff3d-342c5890-dc9162a0 to complete. Operation error was Quota 'SSD_TOTAL_GB' exceeded.  Limit: 4000.0 in region europe-west3.

Expected Results

  • No jenkins slave is created, since the agent did not start in gcloud

Actual Results

  • For each failed startup, a 'Zombie' Jenkins Slave is created. This slave is permanently displayed as offline.
  • The slave cannot be deleted via normal methods, since deletion tries to delete the instance in gcp, which fails because there is no instance in gcp to delete

Anything else?

No response

Are you interested in contributing a fix?

I have created a fix and will make a pull request soon. I am currently testing my fixes on our jenkins server.

@Theoderich Theoderich added the bug Something isn't working label Sep 8, 2024
@Theoderich Theoderich linked a pull request Sep 8, 2024 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant