Skip to content

Commit

Permalink
Merge pull request #196 from sunya-ch/tekton
Browse files Browse the repository at this point in the history
add Tekton pipelinerun
  • Loading branch information
rootfs authored Nov 30, 2023
2 parents e2a3b1b + cc77c08 commit 873aef7
Show file tree
Hide file tree
Showing 15 changed files with 948 additions and 1 deletion.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
This repository contains source code related to Kepler power model. The modules in this reposioty connects to [core Kepler project](https://github.com/sustainable-computing-io/kepler) and [kepler-model-db](https://github.com/sustainable-computing-io/kepler-model-db) as below.
![](./fig/model-server-components-simplified.png)

## Deployment
## Model server and estimator deployment

Deploy with estimator sidecar
```
Expand All @@ -16,6 +16,10 @@ Deploy with estimator sidecar and model server
OPTS="ESTIMATOR SERVER" make deploy
```

## Model training
- [Use Tekton pipeline](./tekton)
- [Use Bash script with CPE operator](./model_training/)

## Local test
### via docker
1. Build image for testing, run
Expand Down
Binary file added fig/tekton-complete-train.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/tekton-single-train.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
91 changes: 91 additions & 0 deletions tekton/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Kepler power model training with Tekton
<!-- TOC tocDepth:2..3 chapterDepth:2..6 -->

- [Pre-requisite](#pre-requisite)
- [Deploy Tekton tasks and pipelines](#deploy-tekton-tasks-and-pipelines)
- [Run Tekton pipeline](#run-tekton-pipeline)
- [Single train run](#single-train-run)
- [Original complete run](#original-complete-run)

<!-- /TOC -->
## Pre-requisite
1. Cluster with Tekton
2. Prepare PersistentVolumeClaim `task-pvc` for workspace

For simple hostpath,
```
kubectl apply -f pvc/hostpath.yaml
```
> The query, preprocess data, and models will be mounted to the hostpath: `/mnt`
## Deploy Tekton tasks and pipelines
```
kubectl apply -f tasks
kubectl apply -f pipelines
```
## Run Tekton pipeline
### Single train run
A single flow to apply a set of trainers to specific feature group and energy source.
![](../fig/tekton-single-train.png)
check [single-train](./pipelines/single-train.yaml) pipeline.
Example for AbsPower model:
```
kubectl apply -f abs-train-pipelinerun.yaml
```
Example of DynPower model:
```
kubectl apply -f dyn-train-pipelinerun.yaml
```
To customize feature metrics, set
parameters|value
---|---
THIRDPARTY_METRICS|customized metric list (use comma as delimiter)
FEATURE_GROUP|`ThirdParty`
To customize stressng workload, set
parameters|value
---|---
STRESS_BREAK_INTERVAL|break interval between each stress load
STRESS_TIMEOUT|stress duration (timeout to stop stress)
STRESS_ARGS|array of arguments for CPU frequency and stressng workload<br>- `CPU_FREQUENCY;STRESS_LOAD;STRESS_INSTANCE_NUM;STRESS_EXTRA_PARAM_KEYS;STRESS_EXTRA_PARAM_VALS`<br>* use `none` if not applicable for `CPU_FREQUENCY`, `STRESS_EXTRA_PARAM_KEYS`, and `STRESS_EXTRA_PARAM_VALS`
To customize preprocessing and training components
parameters|value
---|---
PIPELINE_NAME|pipeline name (output prefix/folder)
EXTRACTOR|extractor class (default or smooth)
ISOLATOR|isolator class (none, min, profile, or trainer)<br> For trainer isolator, ABS_PIPELINE_NAME must be set to use existing trained pipeline to estimate background power.
TRAINERS|list of trainer classes (use comma as delimiter)
### Original complete run
Apply a set of trainers to all available feature groups and energy sources
![](../fig/tekton-complete-train.png)
check [complete-train](./pipelines/complete-train.yaml) pipeline.
```
kubectl apply -f complete-pipelinerun.yaml
```
To customize `ThirdParty` feature group, set
parameters|value
---|---
THIRDPARTY_METRICS|customized metric list (use comma as delimiter)
Stressng load can be set similarly to [single train run](#single-train-run).
To customize pipeline components, `PIPELINE_NAME`, `EXTRACTOR`, `ISOLATOR`, and `ABS_PIPELINE_NAME` can be set similarly to [single train run](#single-train-run).
Instead of `TRAINERS`, original pipeline run use `ABS_TRAINERS` and `DYN_TRAINERS` to specify the list for trainers for AbsPower training and DynPower training respectively.
18 changes: 18 additions & 0 deletions tekton/abs-train-pipelinerun.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: example-abs-train-pipeline
spec:
workspaces:
- name: mnt
persistentVolumeClaim:
claimName: task-pvc
params:
- name: PIPELINE_NAME
value: AbsPowerTrainPipelineExample
- name: OUTPUT_TYPE
value: AbsPower
- name: IDLE_COLLECT_INTERVAL
value: 1
pipelineRef:
name: single-train-pipeline
14 changes: 14 additions & 0 deletions tekton/complete-pipelinerun.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: example-complete-train-pipeline
spec:
workspaces:
- name: mnt
persistentVolumeClaim:
claimName: task-pvc
params:
- name: PIPELINE_NAME
value: CompleteTrainPipelineExample
pipelineRef:
name: complete-train-pipeline
16 changes: 16 additions & 0 deletions tekton/dyn-train-pipelinerun.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
name: example-dyn-train-pipeline
spec:
workspaces:
- name: mnt
persistentVolumeClaim:
claimName: task-pvc
params:
- name: PIPELINE_NAME
value: DynPowerTrainPipelineExample
- name: OUTPUT_TYPE
value: DynPower
pipelineRef:
name: single-train-pipeline
166 changes: 166 additions & 0 deletions tekton/pipelines/complete-train.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
##############################################
##
## complete-train-pipeline:
##
## - presteps (collect metrics at idle state and record start time)
## - run stressng workloads (tasks/stressng-task.yaml)
## - collect metrics (record end time and collect metrics when running stressng)
## - run original model server pipeline which produces AbsPower and DynPower
## for all available feature groups
##
##############################################
apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
name: complete-train-pipeline
spec:
workspaces:
- name: mnt
description: Mount path
params:
- name: MODEL_SERVER_IMAGE
description: Specify model server image
default: quay.io/sustainable_computing_io/kepler_model_server:v0.7
- name: PIPELINE_NAME
description: Specify pipeline name (output prefix/folder)
- name: ENERGY_SOURCE
description: Specify target energy sources (check https://sustainable-computing.io/kepler_model_server/pipeline/#energy-source)
default: acpi,rapl
- name: IDLE_COLLECT_INTERVAL
description: Specify interval time to collect profile (idle) data before start the workload
default: 100
- name: STRESS_BREAK_INTERVAL
description: Specify break interval between each stress load
default: 5
- name: STRESS_TIMEOUT
description: Specify stress duration (timeout to stop stress)
default: 30
- name: STRESS_ARGS
description: List arguments for CPU frequency and stressng workload (CPU_FREQUENCY;STRESS_LOAD;STRESS_INSTANCE_NUM;STRESS_EXTRA_PARAM_KEYS;STRESS_EXTRA_PARAM_VALS)
type: array
default:
- "3900000;cpu;8;none;none"
- name: EXTRACTOR
description: Specify extractor class (default or smooth)
default: default
- name: ISOLATOR
description: Specify isolator class (none, min, profile, or trainer (if ABS_PIPELINE_NAME is set)
default: min
- name: ABS_TRAINERS
description: Specify trainer names for AbsPower training (use comma(,) as delimiter)
default: default
- name: DYN_TRAINERS
description: Specify trainer names for DynPower training (use comma(,) as delimiter)
default: default
- name: THIRDPARTY_METRICS
description: Specify list of third party metric to export (required only for ThirdParty feature group)
default: ""
- name: ABS_PIPELINE_NAME
description: Specify pipeline name to be used for initializing trainer isolator
default: ""
tasks:
- name: presteps
params:
- name: IDLE_COLLECT_INTERVAL
value: $(params.IDLE_COLLECT_INTERVAL)
- name: THIRDPARTY_METRICS
value: $(params.THIRDPARTY_METRICS)
- name: MODEL_SERVER_IMAGE
value: $(params.MODEL_SERVER_IMAGE)
taskSpec:
workspaces:
- name: mnt
optional: true
params:
- name: IDLE_COLLECT_INTERVAL
- name: THIRDPARTY_METRICS
- name: MODEL_SERVER_IMAGE
results:
- name: stress-start-time
description: The time recorded before running the workload
steps:
- name: collect-idle
image: $(params.MODEL_SERVER_IMAGE)
args:
- cmd/main.py
- query
- --data-path=$(workspaces.mnt.path)/data
- --interval=$(params.IDLE_COLLECT_INTERVAL)
- --thirdparty-metrics="$(params.THIRDPARTY_METRICS)"
- --benchmark=idle
- -o=idle
command: ["python3.8"]
env:
- name: PROM_SERVER
value: http://prometheus-k8s.monitoring.svc:9090
- name: record-start-time
image: bash:v0.0.0
script: |
#!/usr/bin/env bash
echo -n $(date +%Y-%m-%dT%H:%M:%SZ) > $(results.stress-start-time.path)
- name: run-stressng
runAfter: [presteps]
taskRef:
name: run-stressng
params:
- name: INTERVAL
value: $(params.STRESS_BREAK_INTERVAL)
- name: TIMEOUT
value: $(params.STRESS_TIMEOUT)
- name: arguments
value: $(params.STRESS_ARGS[*])
- name: collect-metric
runAfter: [run-stressng]
params:
- name: THIRDPARTY_METRICS
value: $(params.THIRDPARTY_METRICS)
- name: MODEL_SERVER_IMAGE
value: $(params.MODEL_SERVER_IMAGE)
taskSpec:
workspaces:
- name: mnt
optional: true
params:
- name: BENCHMARK
default: stressng
- name: THIRDPARTY_METRICS
- name: MODEL_SERVER_IMAGE
steps:
- name: collect-stressng
image: $(params.MODEL_SERVER_IMAGE)
args:
- cmd/main.py
- query
- --data-path=$(workspaces.mnt.path)/data
- --start-time=$(tasks.presteps.results.stress-start-time)
- --end-time=$(tasks.run-stressng.results.stress-end-time)
- --thirdparty-metrics="$(params.THIRDPARTY_METRICS)"
- --benchmark=stressng
- -o=kepler_query
command: ["python3.8"]
env:
- name: PROM_SERVER
value: http://prometheus-k8s.monitoring.svc:9090
- name: train-from-query
runAfter: [collect-metric]
workspaces:
- name: mnt
taskRef:
name: original-pipeline-task
params:
- name: MODEL_SERVER_IMAGE
value: $(params.MODEL_SERVER_IMAGE)
- name: PIPELINE_NAME
value: $(params.PIPELINE_NAME)
- name: EXTRACTOR
value: $(params.EXTRACTOR)
- name: ISOLATOR
value: $(params.ISOLATOR)
- name: ABS_TRAINERS
value: $(params.ABS_TRAINERS)
- name: DYN_TRAINERS
value: $(params.DYN_TRAINERS)
- name: ENERGY_SOURCE
value: $(params.ENERGY_SOURCE)
- name: THIRDPARTY_METRICS
value: $(params.THIRDPARTY_METRICS)
Loading

0 comments on commit 873aef7

Please sign in to comment.