Skip to content

Commit

Permalink
add profile runner deploy scripts
Browse files Browse the repository at this point in the history
Signed-off-by: lrq619 <[email protected]>
  • Loading branch information
leokondrashov authored and lrq619 committed Jan 16, 2024
1 parent 6f46bda commit 01ccebc
Show file tree
Hide file tree
Showing 3 changed files with 119 additions and 11 deletions.
77 changes: 77 additions & 0 deletions scripts/github_runner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
This is the deployment scripts of github self-hosted runners, used to execute some of the unit tests.

There are four self-hosted runners in total:
* cri-firecracker: Used for [firecracker cri tests](../../.github/workflows/integration_tests.yml)
* cri-gvisor: Used for [gvisor cri tests](../../.github/workflows/gvisor_cri_tests.yml)
* integ: Used for [integration tests](../../.github/workflows/integration_tests.yml)
* profile: Used for [profile unit tests](../../.github/workflows/unit_tests.yml), job: `profile-unit-test`

# Deploy Runners
Runners physical node configuration:
Four nodes with 4C-8G, 100GB storage. Suggested system image: `ubuntu-20.04-2nic`

How to deploy the four nodes:
1. Build the runner deployer
```
go build .
```
2. Modify the `conf.json`

Need to modify `conf.json`, the format is as following:
```
{
"ghOrg": "<GitHub account>",
"ghPat": "<GitHub PAT>",
"hostUsername": "<username>",
"runners": {
"<hostname-1>": {
"type": "cri",
"sandbox": "firecracker"
},
"<hostname-2>": {
"type": "cri",
"sandbox": "gvisor",
},
"<hostname-3>": {
"type": "integ",
"num": 2,
"restart": false
},
"<hostname-4>": {
"type": "profile"
}
}
}
```

Note that in `conf.json`, for `ghOrg`, it's `vhive-serverless`, for `ghPat`, it should be your own account's Personal Access Token, as long as your account has the correct permissions for `vhive-serverless` org.

`<username>:<hostname-1/2/3/4>` is the ssh username and hostname, so if you use `SCSE` cloud nodes as runners, `<hostname-1/2/3/4>` should be their `ip` addresses.

After modifying this, deploy the runners remotely by running:
```
./deploy_runners
```

If it gives out error like `“dial unix: missing address”`, use:
```
eval `ssh-agent`
ssh-add ~/.ssh/<private_key>
```
Here `<private_key>` should be the key that has the ssh permission to all four runners, typically it's `id_rsa`

# Restart Runners
On `SCSE` cloud, rebuild the four nodes and redeploy them.

# When Should Restart Runners
For firecracker and gvisor cri tests, when the test stuck in `helloworld is waiting for a Revision to be ready`
<img width="814" alt="bc67c34ef2308282b8285077534667f" src="https://github.com/vhive-serverless/vHive/assets/58351056/78cea3f8-b42f-4807-ad7a-10fea14a8eea">

This basically implies that the firecracker and gvisor cri runners need to be restart(You can also restart only one runner in that case)
But if the firecracker and gvisor cri test passed the `Setup vHive CRI test environment` step and failed in `Run vHive CRI tests` step, this typically is just sporadic failure and can be resolved by re-running the tests, just trigger the re-run button on github webpage is okay.

# Notice for Github PAT
Below are steps for generating github PAT:
1. On your personal github webpage, click `Developer settings` > `Personal access tokens` > `Tokens(classic)`, note that do not generate `Fine-grained tokens`
2. You can choose any expiration date. For PAT scopes, you can simply check the `repo`.
3. Note that **NEVER** push your PAT to github or any other public spaces, it's unsafe to your github account and also, when github scans that your PAT is open for public access, the PAT is deprecated.
16 changes: 5 additions & 11 deletions scripts/github_runner/deploy_runners.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,7 @@ func main() {
"num": 2
}
"pc75.cloudlab.umass.edu": {
"type": "integ",
"num": 6,
"restart": true
"type": "profile"
}
}
}
Expand Down Expand Up @@ -124,7 +122,7 @@ func deployRunner(host string, runnerConf RunnerConf, deployerConf *DeployerConf
}

log.Debugf("Cloning vHive repository on %s@%s", deployerConf.HostUsername, host)
out, err := client.Exec(fmt.Sprintf("rm -rf ./vhive ./runner && git clone --depth=1 https://github.com/%s/vhive", deployerConf.GhOrg))
out, err := client.Exec(fmt.Sprintf("rm -rf ./vhive ./actions-runner && git clone --depth=1 https://github.com/%s/vhive", deployerConf.GhOrg))
log.Debug(string(out))
if err != nil {
log.Fatalf("Failed to clone vHive repository on %s@%s: %s", deployerConf.HostUsername, host, err)
Expand All @@ -136,12 +134,6 @@ func deployRunner(host string, runnerConf RunnerConf, deployerConf *DeployerConf
log.Debugf("Adding redeploy crontab task to %s@%s", deployerConf.HostUsername, host)
setupCmd = fmt.Sprintf("cd vhive && ./scripts/github_runner/setup_bare_metal_runner.sh %s %s %s", deployerConf.GhOrg,
deployerConf.GhPat, runnerConf.Sandbox)
var redeploySetupCmd string = fmt.Sprintf("echo '10 4 * * * root rm -rf ./runners/ && %s' >> /etc/crontab", setupCmd)
out, err = client.Exec(redeploySetupCmd)
log.Debug(string(out))
if err != nil {
log.Fatalf("Failed to setup redeploy task on %s@%s: %s", deployerConf.HostUsername, host, err)
}
case "integ":
var restart string
if runnerConf.Restart {
Expand All @@ -160,8 +152,10 @@ func deployRunner(host string, runnerConf RunnerConf, deployerConf *DeployerConf

setupCmd = fmt.Sprintf("cd vhive && ./scripts/github_runner/setup_integ_runners.sh %d %s %s %s", runnerConf.Num,
deployerConf.GhOrg, deployerConf.GhPat, restart)
case "profile":
setupCmd = fmt.Sprintf("cd vhive && chmod +x ./scripts/github_runner/setup_profile_runner.sh && ./scripts/github_runner/setup_profile_runner.sh %s %s", deployerConf.GhOrg, deployerConf.GhPat)
default:
log.Fatalf("Invalid runner type: '%s', expected 'cri' or 'integ'", runnerConf.Type)
log.Fatalf("Invalid runner type: '%s', expected 'cri', 'integ' or 'profile'", runnerConf.Type)
}

log.Debugf("Setting up runner on %s@%s", deployerConf.HostUsername, host)
Expand Down
37 changes: 37 additions & 0 deletions scripts/github_runner/setup_profile_runner.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# setup runner for profile unit test

GH_ORG=$1
GH_PAT=$2

sudo apt-get update
sudo apt-get install -y jq tmux

# Based on https://github.com/actions/runner/blob/0484afeec71b612022e35ba80e5fe98a99cd0be8/scripts/create-latest-svc.sh#L112-L131
RUNNER_TOKEN=$(curl -s -X POST https://api.github.com/repos/"$GH_ORG"/vhive/actions/runners/registration-token -H "accept: application/vnd.github.everest-preview+json" -H "authorization: token $GH_PAT" | jq -r '.token')
if [ "null" == "$RUNNER_TOKEN" ] || [ -z "$RUNNER_TOKEN" ]; then
echo "Failed to get a runner token"
exit 1
fi

cd $HOME
if [ ! -d "$HOME/actions-runner" ]; then
mkdir actions-runner && cd actions-runner
LATEST_VERSION=$(curl -s https://api.github.com/repos/actions/runner/releases/latest | grep 'browser_' | cut -d\" -f4 | grep 'linux-x64-[0-9\.]*.tar.gz')
curl -o actions-runner-linux-x64.tar.gz -L -C - $LATEST_VERSION
tar xzf "./actions-runner-linux-x64.tar.gz"
rm actions-runner-linux-x64.tar.gz
chmod +x ./config.sh
chmod +x ./run.sh
RUNNER_ALLOW_RUNASROOT=1 ./config.sh --url "https://github.com/$GH_ORG/vHive" \
--token "${RUNNER_TOKEN}" \
--name "profile-test-github-runner" \
--work "$HOME/actions-runner/_work" \
--labels "profile" \
--unattended \
--replace

fi

cd $HOME/actions-runner
tmux new-session -d -s session_name 'RUNNER_ALLOW_RUNASROOT=1 ./run.sh'
echo "SETUP PROFILE FINISHED"

0 comments on commit 01ccebc

Please sign in to comment.