-
Notifications
You must be signed in to change notification settings - Fork 344
Debugging
Helio Machado edited this page Jun 17, 2022
·
2 revisions
Publishing some old snippets I wrote months ago, moved from https://github.com/iterative/cml/issues/852#issuecomment-1014955752
The following code snippets produce a full trace-level log of the Terraform provider, useful to diagnose a lot of hard to reproduce bugs related to cml-runner --cloud
and cloud instances.
debug:
when: always
image: iterativeai/cml
variables:
TF_LOG: trace
TF_LOG_PATH: /tmp/terraform.log
script:
- cml-runner
--cloud=aws
--cloud-region=us-west-1
--cloud-type=t2.micro
|| true
- cat "$TF_LOG_PATH"
on: push
env:
TF_LOG: trace
TF_LOG_PATH: /tmp/terraform.log
jobs:
debug:
runs-on: ubuntu-latest
steps:
- uses: iterative/setup-cml@v1
- run: >-
cml-runner
--cloud=aws
--cloud-region=us-west-1
--cloud-type=t2.micro
|| true
- run: cat "$TF_LOG_PATH"
debug:
when: always
script:
- mkdir -p ~/.ssh && printf 'y\n\n' | ssh-keygen -q -t rsa -N '' -f ~/.ssh/id_rsa
- apt update && apt install --yes tmate expect
- TERM=xterm unbuffer ./tmate -FS /tmp/tmate.sock | cat
cml runner ··· --cloud-ssh-private="$(cat ~/.ssh/id_rsa)"
You can get the instance address by setting the TF_LOG
and TF_LOG_PATH
environment variables and searching for instance address in the logs.
on: push
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: iterative/setup-cml@v1
- run: >-
cml-runner
--labels=test
--cloud=aws
--cloud-region=eu-west
--cloud-type=g4dn.xlarge
--cloud-spot
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run:
needs: deploy
runs-on:
- self-hosted
- test
steps:
- run: |
set -x
cat /var/log/cloud-init.log || true
cat /var/log/cloud-init-output.log || true
journalctl -u cml || true
nvidia-smi || true