`cml runner` aggressively shutting down instance with active job running #1054

danieljimeneznz · 2022-06-09T06:14:37Z

Similar to #808, we have been seeing our GCP VM instance shutdown randomly in the first few minutes even though a job is still running (we noticed that the GitHub pull, etc. starts on the runner so I don't think it's an authentication based problem from the runner) log below from the VM. We have been trying to get CML working on an n2-standard-4 (slowly beefing up the server) which has 4 vCPU's, along with 16GB ram, a 50GB HDD, and 10GBps network. Is there anyway to debug what might be causing the issues on the VM (i.e. a flag that will stop the aggresive instance destruction so that we can debug this).

Workflow File:

name: CML

env:
  GCP_PROJECT_ID: ***
  GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GCP_CREDENTIALS }}

on:
  pull_request:
    branches:
      - main

jobs:
  deploy-runner:
    runs-on: ubuntu-latest
    steps:
      - uses: navikt/github-app-token-generator@v1
        id: get_token
        with:
          private-key: ${{ secrets.CML_GITHUB_APP_PEM }}
          app-id: ${{ secrets.CML_GITHUB_APP_ID }}
      - run: echo "REPO_TOKEN=$REPO_TOKEN" >> "$GITHUB_ENV"
        env:
          REPO_TOKEN: ${{ steps.get_token.outputs.token }}
      - uses: iterative/setup-cml@v1
      - uses: actions/checkout@v2
      - name: Deploy runner
        run: |
          cml runner \
            --cloud gcp \
            --cloud-region us-west1-a \
            --cloud-type n2-standard-8 \
            --cloud-gpu nogpu \
            --cloud-hdd-size 50 \
            --single \
            --idle-timeout 100000

  train-model:
    runs-on: [self-hosted, cml]
    needs: deploy-runner
    timeout-minutes: 4320 # 72h
    container:
      image: docker://iterativeai/cml:0-dvc2-base1
    env:
      REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    steps:
      - uses: actions/checkout@v2
      - name: Train model
        run: |
          python test_cml.py

          # Create CML report
          cat test_cml.txt >> report.md
          cml send-comment report.md

Output Logs:

-- Logs begin at Thu 2022-06-09 05:59:19 UTC. --
Jun 09 06:01:32 cml-xav4fkawcs systemd[1]: Started cml.service.
Jun 09 06:01:38 cml-xav4fkawcs cml.sh[21654]: {"level":"info","message":"Preparing workdir /tmp/tmp.OIaVby26sE/.cml/cml-xav4fkawcs..."}
Jun 09 06:01:38 cml-xav4fkawcs cml.sh[21654]: {"level":"info","message":"Launching github runner"}
Jun 09 06:01:55 cml-xav4fkawcs cml.sh[21654]: {"level":"warn","message":"SpotNotifier can not be started."}
Jun 09 06:01:56 cml-xav4fkawcs cml.sh[21654]: {"date":"2022-06-09T06:01:56.446Z","level":"info","message":"runner status","repo":"https://github.com/spark-64/***"}
Jun 09 06:01:56 cml-xav4fkawcs cml.sh[21654]: {"date":"2022-06-09T06:01:56.447Z","level":"info","message":"runner status √ Connected to GitHub","repo":"https://github.com/spark-64/***"}
Jun 09 06:01:56 cml-xav4fkawcs cml.sh[21654]: {"date":"2022-06-09T06:01:56.856Z","level":"info","message":"runner status Current runner version: '2.292.0'","repo":"https://github.com/spark-64/***"}
Jun 09 06:01:56 cml-xav4fkawcs cml.sh[21654]: {"date":"2022-06-09T06:01:56.857Z","level":"info","message":"runner status Listening for Jobs","repo":"https://github.com/spark-64/***","status":"ready"}
Jun 09 06:02:07 cml-xav4fkawcs cml.sh[21654]: {"date":"2022-06-09T06:02:07.077Z","job":"gh","level":"info","message":"runner status Running job: train-model","repo":"https://github.com/spark-64/***","status":"job_started"}

GitHub Actions Hanging:

My colleague @TessaPhillips, and I have been trying to figure this out to no avail - more than happy to contribute if you can point us in the right direction!

The text was updated successfully, but these errors were encountered:

DavidGOrtega · 2022-06-09T07:01:04Z

Im pretty sure is fixed by #1030

danieljimeneznz · 2022-06-09T08:25:40Z

Im pretty sure is fixed by #1030

Tried this branch with the following workflow and package.json with the same outcome:

Workflow:

      - uses: actions/checkout@v2
        with:
          persist-credentials: false
      - run: git config --global url."https://github.com/".insteadOf ssh://[email protected]/
      - run: npm ci
      - name: Deploy runner
        run: |
          ./node_modules/.bin/cml runner \
            --cloud gcp \
            --cloud-region us-west1-c \
            --cloud-type n2-standard-8 \
            --cloud-gpu nogpu \
            --cloud-hdd-size 50 \
            --single \
            --idle-timeout 100000

`Package.json:

{
  "devDependencies": {
    "@dvcorg/cml": "github:iterative/cml#runner-no-special-cases"
  }
}

dacbd · 2022-06-09T18:31:55Z

@danieljimeneznz If you can open the GCP web console edit the cml instance check this box to prevent the instance destruction:

This will keep the instance up for you to capture more of the cml.service logs

DavidGOrtega · 2022-06-09T18:59:05Z

@danieljimeneznz I see, I was thinking that the runner was not honouring the job coming but what is happening is that your code probably is OOM is not the train job displaying any log? remember that the logs in the train job should be accesible. If not as @dacbd says keeping the machine should be enough to verify whats going on.

The runners logs resides at the installation folder. In your example was at /tmp/tmp.OIaVby26sE/.cml/cml-xav4fkawcs

dacbd · 2022-06-09T21:09:29Z

Just to confirm some things,

your workflow hangs on a specific step or after a consistent amount of time?
what does your GitHub Actions UI show, any errors, or just an infinite spinning yellow circle?
can try the train-model job without the container: line and with actions/checkout@v3?

some helpful commands you can use:

sudo journalctl -n all -u cml.service --no-pager

shouldn't be needed but:

cat /var/log/syslog | awk 'match($0, /GCEMetadataScripts: startup-script:/){print substr($0,RSTART+36) }' | more ( for possible provisioning errors [iirc may contain sensitive values so don't blindly c/p] )
sudo dmesg --ctime (if it died castotrophicly)

danieljimeneznz · 2022-06-11T00:07:28Z

Instance shutdown was occurring usually at 5 minutes, then the GitHub action/workflow would hang with an infinite spinning yellow circle, that being said:

The strangest thing, I went to do some investigation today to try and determine what the cause of the issue was, however everything seems to be working fine this morning and no matter what I do, I can't seem to break it again... 🤷

I tried the following things:

Removed the firewall rule we setup when this wasn't working last time to allow all ingress traffic on all ports (the instance seemed like it didn't want to download the navikt/github-app-token-generator@v1 from GitHub) - seems like the ingress was happy downloading actions today...
Switched between using navikt/github-app-token-generator@v1 & REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }} for authenticating with the GitHub API on the self hosted runner - appears that the GITHUB_TOKEN provided by the secret has the correct permissions to make GH API requests (Originally when using cml send-comment report.md this would fail utilizing the GitHub App I made for CML as I missed the Contents permission - will open a PR to update the docs for this shortly).
Changing the location of the self hosted runner region from us-west1-a to us-west1-c - made no difference and worked fine in both zones.

My best guess at the issue is that the runner was hitting 403 authentication errors against the GitHub API, which could mean that the termination logic of the Iterative Terraform provider gets triggered on the GitHub action side causing the runner to shutdown. But based on all the changes I tried above, I'm not so sure this was the case.

The final workflow we landed on that works quite well is:

---
name: CML

env:
  GCP_PROJECT_ID: ***
  GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GCP_CREDENTIALS }}

on:
  pull_request:
    branches:
      - main

jobs:
  deploy-runner:
    runs-on: ubuntu-latest
    steps:
      - uses: navikt/github-app-token-generator@v1
        id: get_token
        with:
          private-key: ${{ secrets.CML_GITHUB_APP_PEM }}
          app-id: ${{ secrets.CML_GITHUB_APP_ID }}
      - run: echo "REPO_TOKEN=$REPO_TOKEN" >> "$GITHUB_ENV"
        env:
          REPO_TOKEN: ${{ steps.get_token.outputs.token }}
      - uses: iterative/setup-cml@v1
      - uses: actions/checkout@v2
      - name: Deploy runner
        run: |
          cml runner \
            --cloud gcp \
            --cloud-region us-west1-c \
            --cloud-type n2-standard-8 \
            --cloud-gpu nogpu \
            --cloud-hdd-size 50 \
            --single

  train-model:
    runs-on: [self-hosted, cml]
    needs: deploy-runner
    timeout-minutes: 4320
    container:
      image: docker://iterativeai/cml:0-dvc2-base1
    env:
      REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    steps:
      - uses: actions/checkout@v2
      - name: Train model
        run: |
          echo "## Hello World" >> report.md
          cml send-comment --pr --commit-sha=${{ github.event.pull_request.head.sha }} report.md

For completeness sake, @dacbd Here is the output that my colleague @TessaPhillips was able to capture with the suggestions you made (enabling delete protection + running the job without container and with actions@checkoutv3) - this tipped me off to the missing Contents permission on the GitHub app, however I removed that permission today and it still works fine...:

{\rtf1\ansi\ansicpg1252\cocoartf2638
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\paperw11900\paperh16840\margl1440\margr1440\vieww11520\viewh8400\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural\partightenfactor0

\f0\fs24 \cf0 -- Logs begin at Thu 2022-06-09 21:57:43 UTC, end at Thu 2022-06-09 22:02:04 UTC. --\
Jun 09 21:59:47 cml-au501c1a9h systemd[1]: Started cml.service.\
Jun 09 21:59:51 cml-au501c1a9h cml.sh[21844]: \{"level":"info","message":"Preparing workdir /tmp/tmp.ccdyeq9BOA/.cml/cml-au501c1a9h..."\}\
Jun 09 21:59:51 cml-au501c1a9h cml.sh[21844]: \{"level":"info","message":"Launching github runner"\}\
Jun 09 22:00:10 cml-au501c1a9h cml.sh[21844]: \{"level":"warn","message":"SpotNotifier can not be started."\}\
Jun 09 22:00:11 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:11.008Z","level":"info","message":"runner status","repo":"https://github.com/spark-64/**"\}\
Jun 09 22:00:11 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:11.008Z","level":"info","message":"runner status \uc0\u8730  Connected to GitHub","repo":"https://github.com/spark-64/**"\}\
Jun 09 22:00:11 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:11.404Z","level":"info","message":"runner status Current runner version: '2.292.0'","repo":"https://github.com/spark-64/**"\}\
Jun 09 22:00:11 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:11.404Z","level":"info","message":"runner status Listening for Jobs","repo":"https://github.com/spark-64/**","status":"ready"\}\
Jun 09 22:00:22 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:22.170Z","job":"gh","level":"info","message":"runner status Running job: train-model","repo":"https://github.com/spark-64/**","status":"job_started"\}\
Jun 09 22:00:27 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:27.977Z","job":"gh","level":"info","message":"runner status Job train-model completed with result: Failed","repo":"https://github.com/spark-64/**","status":"job_ended","success":false\}\
Jun 09 22:00:28 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:28.134Z","level":"info","message":"runner status \uc0\u8730  Removed .credentials","repo":"https://github.com/spark-64/**"\}\
Jun 09 22:00:28 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:28.135Z","level":"info","message":"runner status \uc0\u8730  Removed .runner","repo":"https://github.com/spark-64/**"\}\
Jun 09 22:00:28 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:28.142Z","level":"info","message":"runner status Runner listener exit with 0 return code, stop the service, no retry needed.","repo":"https://github.com/spark-64/**"\}\
Jun 09 22:00:28 cml-au501c1a9h cml.sh[21844]: \{"date":"2022-06-09T22:00:28.142Z","level":"info","message":"runner status Exiting runner...","repo":"https://github.com/spark-64/**"\}\
Jun 09 22:00:28 cml-au501c1a9h cml.sh[21844]: \{"level":"info","message":"runner status","reason":"runner closed with exit code 0","status":"terminated"\}\
Jun 09 22:00:28 cml-au501c1a9h cml.sh[21844]: \{"level":"info","message":"waiting 10 seconds before exiting..."\}\
Jun 09 22:00:38 cml-au501c1a9h cml.sh[21844]: \{"level":"info","message":"Unregistering runner cml-au501c1a9h..."\}\
Jun 09 22:00:38 cml-au501c1a9h cml.sh[21844]: \{"level":"error","message":"\\tFailed: Cannot destructure property 'id' of '(intermediate value)' as it is undefined."\}\
Jun 09 22:00:38 cml-au501c1a9h cml.sh[21844]: \{"level":"error","message":"Resource not accessible by integration","name":"HttpError","request":\{"headers":\{"accept":"application/vnd.github.v3+json","authorization":"token [REDACTED]","user-agent":"octokit-rest.js/18.0.0 octokit-core.js/3.6.0 Node.js/16.15.0 (linux; x64)"\},"method":"GET","request":\{"agent":\{\}\},"url":"https://api.github.com/repos/spark-64/**/actions/runs?status=in_progress"\},"response":\{"data":\{"documentation_url":"https://docs.github.com/rest/reference/actions#list-workflow-runs-for-a-repository","message":"Resource not accessible by integration"\},"headers":\{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset","connection":"close","content-encoding":"gzip","content-security-policy":"default-src 'none'","content-type":"application/json; charset=utf-8","date":"Thu, 09 Jun 2022 22:00:38 GMT","referrer-policy":"origin-when-cross-origin, strict-origin-when-cross-origin","server":"GitHub.com","strict-transport-security":"max-age=31536000; includeSubdomains; preload","transfer-encoding":"chunked","vary":"Accept-Encoding, Accept, X-Requested-With","x-content-type-options":"nosniff","x-frame-options":"deny","x-github-media-type":"github.v3; format=json","x-github-request-id":"CD20:2D7D:A294EF:ABDE1F:62A26D86","x-ratelimit-limit":"7100","x-ratelimit-remaining":"7092","x-ratelimit-reset":"1654815420","x-ratelimit-resource":"core","x-ratelimit-used":"8","x-xss-protection":"0"\},"status":403,"url":"https://api.github.com/repos/spark-64/**/actions/runs?status=in_progress"\},"stack":"HttpError: Resource not accessible by integration\\n    at /snapshot/cml/node_modules/@octokit/request/dist-node/index.js:86:21\\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\\n    at async Job.doExecute (/snapshot/cml/node_modules/bottleneck/light.js:405:18)","status":403\}\
Jun 09 22:00:41 cml-au501c1a9h systemd[1]: cml.service: Succeeded.}

dacbd · 2022-06-11T00:31:47Z

@danieljimeneznz Thanks for the detailed response, the unreliable-ness and the 5ish minute nature of it makes me think this is the latest incarnation of the cursed ghost of cml runner's past. aka seems like #808 :cough: @DavidGOrtega :cough: #1030 (comment) :cough:

This should not happen with the refactor the timeout logic in #1030 (note that your first attempt to use the mentioned branch only applied to the invoking cml runner and not the instances version cml runner)

if it occurs again and you are up to it, you can fork cml litter the code with logging everywhere, to use a custom branch/repo on the instance use:

cml runner ... \
  --cml-version=github:your_fork/cml#branch_name \
  ....

My best guess at the issue is that the runner was hitting 403 authentication errors against the GitHub API ...

with your use of --single this wouldn't cause the issue your seeing. for your case if the cml runner cmd exits find then the token is good. The subsequent steps that run on on the runner use the github.token that is built into GitHub Actions.

The navikt/github-app-token-generator@v1 token is only needed to register the runner, with --single github unregistered the runner after it reports it one job as done so the other API calls that the token would be used with don't really matter.

danieljimeneznz · 2022-06-11T03:06:05Z

@danieljimeneznz Thanks for the detailed response, the unreliable-ness and the 5ish minute nature of it makes me think this is the latest incarnation of the cursed ghost of cml runner's past. aka seems like #808 :cough: @DavidGOrtega :cough: #1030 (comment) :cough:

This should not happen with the refactor the timeout logic in #1030 (note that your first attempt to use the mentioned branch only applied to the invoking cml runner and not the instances version cml runner)

if it occurs again and you are up to it, you can fork cml litter the code with logging everywhere, to use a custom branch/repo on the instance use:
cml runner ... \
  --cml-version=github:your_fork/cml#branch_name \
  ....
My best guess at the issue is that the runner was hitting 403 authentication errors against the GitHub API ...

with your use of --single this wouldn't cause the issue your seeing. for your case if the cml runner cmd exits find then the token is good. The subsequent steps that run on on the runner use the github.token that is built into GitHub Actions.

The navikt/github-app-token-generator@v1 token is only needed to register the runner, with --single github unregistered the runner after it reports it one job as done so the other API calls that the token would be used with don't really matter.

If I see it happen again I'll do a deep dive and try to find the underlying cause! Thanks for the info about the branch/API calls. Had a quick read through of the current code in that PR (#1030), the logic seems sound enough to me.

I wonder if using RUNNER_JOBS_RUNNING as the idle check could be causing the RUNNER_TIMER idle count to increase on some of the slower machines, which causes issues i.e. between the time window whilst the config.sh/run.sh commands are running on initialization via await cml.startRunner (the API call to GH via the config.sh could delay the first job_started log from showing? Resulting in the shutdown logic being triggered, and could change from machine to machine depending on network conditions) i.e.

cml/bin/cml/runner.js

Lines 286 to 291 in 9b0a217

    
           const idle = RUNNER_JOBS_RUNNING.length === 0; 
        
           if (RUNNER_TIMER >= idleTimeout) { 
        
             shutdown({ ...opts, reason: `timeout:${idleTimeout}` }); 
        
             clearInterval(watcher); 
        
           }

^With the logic above and combining that with the logs I posted, the RUNNER_TIMER would be at 75 before the first job_started occurs.

One other thing that I'm curious about though (which isn't necessarily related to this issue):

For the following lines, won't a job on GitHub not be able to run for longer than 1 hour? (if you don't provide the --no-retry flag. Came across this one searching for GH_5_MIN_TIMEOUT to see if that timeout was the cause).

cml/bin/cml/runner.js

Lines 297 to 322 in 9b0a217

    
             if (!noRetry) { 
        
               try { 
        
                 winston.info(`EC2 id ${await SpotNotifier.instanceId()}`); 
        
                 SpotNotifier.on('termination', () => 
        
                   shutdown({ ...opts, reason: 'spot_termination' }) 
        
                 ); 
        
                 SpotNotifier.start(); 
        
               } catch (err) { 
        
                 winston.warn('SpotNotifier can not be started.'); 
        
               } 
        
               if (cml.driver === 'github') { 
        
                 const watcherSeventyTwo = setInterval(() => { 
        
                   RUNNER_JOBS_RUNNING.forEach((job) => { 
        
                     if ( 
        
                       new Date().getTime() - new Date(job.date).getTime() > 
        
                       GH_5_MIN_TIMEOUT 
        
                     ) { 
        
                       shutdown({ ...opts, reason: 'timeout:72h' }); 
        
                       clearInterval(watcherSeventyTwo); 
        
                     } 
        
                   }); 
        
                 }, 60 * 1000); 
        
               } 
        
             } 
        
           };

dacbd · 2022-06-11T03:52:50Z

I wonder if using RUNNER_JOBS_RUNNING as the idle check could be causing the RUNNER_TIMER idle count to increase on some of the slower machines, which causes issues i.e. between the time window whilst the config.sh/run.sh commands are running on initialization via await cml.startRunner (the API call to GH via the config.sh could delay the first job_started log from showing? Resulting in the shutdown logic being triggered, and could change from machine to machine depending on network conditions) i.e.

If initialization takes too long then the cml runner exits and your job fails, tpi ssh's to check that the cml.service is running, if that doesn't happen in time it deletes the instance.

All the operations in the cml.startRunner are in effect blocking. The timers haven't started until the run.sh child_process is spawned/returned to runLocal function.

^With the logic above and combining that with the logs I posted, the RUNNER_TIMER would be at 75 before the first job_started occurs.

I don't follow, both sets of logs show the first job picked up within 10 seconds of the GitHub Action client starting?

For the following lines, won't a job on GitHub not be able to run for longer than 1 hour? (if you don't provide the --no-retry flag. Came across this one searching for GH_5_MIN_TIMEOUT to see if that timeout was the cause).

If I understand your question correctly, no that is not the case, but perhaps GH_5_MIN_TIMEOUT is poorly named? It 5 mins before the jobs 72hr timeout see:

cml/bin/cml/runner.js

Line 17 in 9b0a217

const GH_5_MIN_TIMEOUT = (72 * 60 - 5) * 60 * 1000;

danieljimeneznz · 2022-06-11T04:34:20Z

If initialization takes too long then the cml runner exits and your job fails, tpi ssh's to check that the cml.service is running, if that doesn't happen in time it deletes the instance.

All the operations in the cml.startRunner are in effect blocking. The timers haven't started until the run.sh child_process is spawned/returned to runLocal function.

Ah didn't see that the exec function came from util wrapped in a Promise...

I don't follow, both sets of logs show the first job picked up within 10 seconds of the GitHub Action client starting?

Ah yep - realized that in #1030 the RUNNER_TIMER gets reset to 0 when a job is running so this wouldn't/won't be a cause for concern anyway...

If I understand your question correctly, no that is not the case, but perhaps GH_5_MIN_TIMEOUT is poorly named? It 5 mins before the jobs 72hr timeout see:

cml/bin/cml/runner.js

Line 17 in 9b0a217

const GH_5_MIN_TIMEOUT = (72 * 60 - 5) * 60 * 1000;

Yeah probably a misnomer - I'm curious where the 72hr timeout came from? The usage limits show 35 days for a maximum workflow runtime - did they maybe change this timeout or does it come from somewhere else?

Also feel free to close this issue, can open another one and reference this (and the ghost of #808) if it happens again - hopefully with a better investigation/potential solution! 😄

dacbd · 2022-06-11T04:57:27Z

Yeah probably a misnomer - I'm curious where the 72hr timeout came from? The usage limits show 35 days for a maximum workflow runtime - did they maybe change this timeout or does it come from somewhere else?

It does appear that they have made changes for self-hosted as well as their provided runners 🙃

dacbd · 2022-06-13T16:44:58Z

@danieljimeneznz feel free to join our Discord if you have more questions not directly related to an issue

DavidGOrtega added p0-critical Max priority (ASAP) cml-runner Subcommand ci-github cloud-gcp Google Cloud labels Jun 9, 2022

casperdcl added the external-request You asked, we did label Jun 10, 2022

dacbd closed this as completed Jun 11, 2022

dacbd self-assigned this Jun 13, 2022

casperdcl mentioned this issue Jun 16, 2022

runner/github: increase self-hosted timeout to 35days #1064

Closed

3 tasks

danieljimeneznz mentioned this issue Jun 18, 2022

fix: Increase GHA workflow timeout to 35 days from 72h #1067

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cml runner` aggressively shutting down instance with active job running #1054

`cml runner` aggressively shutting down instance with active job running #1054

danieljimeneznz commented Jun 9, 2022 •

edited

Loading

DavidGOrtega commented Jun 9, 2022

danieljimeneznz commented Jun 9, 2022 •

edited

Loading

dacbd commented Jun 9, 2022

DavidGOrtega commented Jun 9, 2022

dacbd commented Jun 9, 2022 •

edited

Loading

danieljimeneznz commented Jun 11, 2022

dacbd commented Jun 11, 2022

danieljimeneznz commented Jun 11, 2022 •

edited

Loading

dacbd commented Jun 11, 2022

danieljimeneznz commented Jun 11, 2022

dacbd commented Jun 11, 2022

dacbd commented Jun 13, 2022

cml runner aggressively shutting down instance with active job running #1054

cml runner aggressively shutting down instance with active job running #1054

Comments

danieljimeneznz commented Jun 9, 2022 • edited Loading

DavidGOrtega commented Jun 9, 2022

danieljimeneznz commented Jun 9, 2022 • edited Loading

dacbd commented Jun 9, 2022

DavidGOrtega commented Jun 9, 2022

dacbd commented Jun 9, 2022 • edited Loading

danieljimeneznz commented Jun 11, 2022

dacbd commented Jun 11, 2022

danieljimeneznz commented Jun 11, 2022 • edited Loading

dacbd commented Jun 11, 2022

danieljimeneznz commented Jun 11, 2022

dacbd commented Jun 11, 2022

dacbd commented Jun 13, 2022

`cml runner` aggressively shutting down instance with active job running #1054

`cml runner` aggressively shutting down instance with active job running #1054

danieljimeneznz commented Jun 9, 2022 •

edited

Loading

danieljimeneznz commented Jun 9, 2022 •

edited

Loading

dacbd commented Jun 9, 2022 •

edited

Loading

danieljimeneznz commented Jun 11, 2022 •

edited

Loading