-
Notifications
You must be signed in to change notification settings - Fork 94
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: HermioneKT <[email protected]>
- Loading branch information
1 parent
5143a83
commit 4c33ae6
Showing
15 changed files
with
756 additions
and
1,153 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
# K8s Power Manager | ||
|
||
## Components | ||
1. **PowerManager Controller**: ensures the actual state matches the desired state of the cluster. | ||
2. **PowerConfig Controller**: sees the powerConfig created by user and deploys Power Node Agents onto each node specified using a DaemonSet. | ||
- powerNodeSelector: A key/value map used to define a list of node labels that a node must satisfy for the operator's node | ||
agent to be deployed. | ||
- powerProfiles: The list of PowerProfiles that the user wants available on the nodes | ||
3. **Power Node Agent**: containerized applications used to communicate with the node's Kubelet PodResources endpoint to discover the exact CPUs that | ||
are allocated per container and tune frequency of the cores as requested | ||
|
||
4. **Power Profile**: predefined configuration that specifies how the system should manage power consumption for various components such as CPUs and GPUs. It includes settings applied to host level such as CPU frequency, governer etc. | ||
|
||
4. **Power Workload**: the object used to define the lists of CPUs configured with a particular PowerProfile. A PowerWorkload is created for each PowerProfile on each Node with the Power Node Agent deployed. A PowerWorkload is represented in the Intel Power Optimization Library by a Pool. The Pools hold the values of the PowerProfile used, their frequencies, and the CPUs that need to be configured. The creation of the Pool – and any additions to the Pool – then | ||
carries out the changes. | ||
|
||
## Setup | ||
|
||
Execute the following below **as a non-root user with sudo rights** using **bash**: | ||
1. Follow [a quick-start guide](docs/quickstart_guide.md) to set up a Knative cluster to run the experiments. | ||
|
||
2. 4 vSwarm benchmarks are used to run the experiments (Spinning, Sleeping, AES, Auth). On master node, deploy these benchmarks on the Knative cluster. | ||
```bash | ||
git clone --depth=1 https://github.com/vhive-serverless/vSwarm.git | ||
|
||
cd $HOME/vSwarm/tools/test-client && go build ./test-client.go | ||
|
||
kn service apply -f $HOME/vSwarm/benchmarks/auth/yamls/knative/kn-auth-python.yaml | ||
kn service apply -f $HOME/vSwarm/benchmarks/aes/yamls/knative/kn-aes-python.yaml | ||
kn service apply -f $HOME/vSwarm/benchmarks/sleeping/yamls/knative/kn-sleeping-go.yaml | ||
kn service apply -f $HOME/vSwarm/benchmarks/spinning/yamls/knative/kn-spinning-go.yaml | ||
``` | ||
|
||
### Experiment 1: Workload sensitivity | ||
2 node cluster is needed. | ||
1. On master node, run the node setup script: | ||
```bash | ||
./scripts/power_manager/setup_power_manager.sh; | ||
``` | ||
Then run the experiment: | ||
```bash | ||
go run $HOME/vhive/examples/power_manager/workload_sensitivity/main.go; | ||
``` | ||
|
||
### Experiment 2: Internode scaling | ||
3 node cluster is needed. 3 scenarios are performed: | ||
- Scenario 1: All worker nodes have low frequency | ||
- Scenario 2: All worker nodes have high frequency | ||
- Scenario 3: 1 worker node has high frequency, another with low frequency (need to manually tune like experiment 3 point 4&5 below) | ||
|
||
1. On master node, run the node setup script: | ||
```bash | ||
./scripts/power_manager/setup_power_manager.sh; | ||
``` | ||
Then run the experiment: | ||
```bash | ||
go run $HOME/vhive/examples/power_manager/internode_scaling_exp/main.go; | ||
``` | ||
|
||
### Experiment 3: Class Assignment | ||
3 node cluster is needed. We need to able to assign the workload to specific node. | ||
|
||
1. Thus on master node, we need to enable nodeSelector: | ||
```bash | ||
kubectl patch configmap config-features -n knative-serving -p '{"data": {"kubernetes.podspec-nodeselector": "enabled"}}' | ||
``` | ||
|
||
2. We need to modify these benchmark knative yaml file to add nodeSelector in vSwarm before deployment (Spinning and AES to worker-high class, Sleeping and Auth to worker-low class). For example: | ||
````yaml | ||
apiVersion: serving.knative.dev/v1 | ||
kind: Service | ||
metadata: | ||
name: spinning-go | ||
namespace: default | ||
spec: | ||
template: | ||
spec: | ||
nodeSelector: | ||
loader-nodetype: worker-high | ||
containers: | ||
- image: docker.io/vhiveease/relay:latest | ||
ports: | ||
- name: h2c | ||
containerPort: 50000 | ||
args: | ||
- --addr=0.0.0.0:50000 | ||
- --function-endpoint-url=0.0.0.0 | ||
- --function-endpoint-port=50051 | ||
- --function-name=aes-go | ||
- image: docker.io/kt05docker/sleeping-go:latest | ||
args: | ||
- --addr=0.0.0.0:50051 - | ||
```` | ||
|
||
3. On master node, label the worker node | ||
```bash | ||
kubectl label node node-1.kt-cluster.ntu-cloud-pg0.utah.cloudlab.us loader-nodetype=worker-low | ||
kubectl label node node-2.kt-cluster.ntu-cloud-pg0.utah.cloudlab.us loader-nodetype=worker-high | ||
``` | ||
Run the node setup script: | ||
```bash | ||
./scripts/power_manager/setup_power_manager.sh; | ||
``` | ||
4. On worker node 1, manually set all CPU frequency to 1.2GHz. ie run the below command for all CPU core: | ||
```bash | ||
echo performance | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor | ||
echo 1200000 | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq | ||
echo 1200000 | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq | ||
``` | ||
5. On worker node 2, manually set all CPU frequency to 2.4GHz. | ||
```bash | ||
echo performance | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor | ||
echo 2400000 | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq | ||
echo 2400000 | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
package powermanager | ||
|
||
import ( | ||
"encoding/csv" | ||
"fmt" | ||
"os" | ||
"os/exec" | ||
"sync" | ||
"time" | ||
|
||
powermanager "github.com/vhive-serverless/vhive/examples/power_manager" | ||
) | ||
|
||
var ( | ||
serviceAssignment = map[string]bool{ | ||
"spinning-go": false, | ||
"sleeping-go": false, | ||
"aes-python": false, | ||
"auth-python": false, | ||
} | ||
) | ||
|
||
func processLatencies(records []int64, serviceName string) { | ||
if len(records) == 0 { | ||
fmt.Println("No data to process") | ||
return | ||
} | ||
|
||
fifthPercentile := powermanager.GetDataAtPercentile(records, 5) | ||
ninetiethPercentile := powermanager.GetDataAtPercentile(records, 90) | ||
difference := float64(ninetiethPercentile-fifthPercentile) / float64(fifthPercentile) | ||
if difference >= 0.40 && !serviceAssignment[serviceName] { // Assign to high performance class | ||
fmt.Println("Assigning to high performance class") | ||
command := fmt.Sprintf("kubectl patch service.serving.knative.dev %s --type merge --patch '{\"spec\":{\"template\":{\"spec\":{\"nodeSelector\":{\"loader-nodetype\":\"worker-high\"}}}}}' --namespace default", serviceName) | ||
cmd := exec.Command("bash", "-c", command) | ||
_, err := cmd.CombinedOutput() | ||
if err != nil { | ||
fmt.Printf(fmt.Sprintf("Error assigning to high performance class: %+v", err)) | ||
return | ||
} | ||
serviceAssignment[serviceName] = true | ||
} | ||
if difference < 0.10 && !serviceAssignment[serviceName] { // Assign to low performance class | ||
fmt.Println("Assigning to low performance class") | ||
command := fmt.Sprintf("kubectl patch service.serving.knative.dev %s --type merge --patch '{\"spec\":{\"template\":{\"spec\":{\"nodeSelector\":{\"loader-nodetype\":\"worker-low\"}}}}}' --namespace default", serviceName) | ||
cmd := exec.Command("bash", "-c", command) | ||
_, err := cmd.CombinedOutput() | ||
if err != nil { | ||
fmt.Printf(fmt.Sprintf("Error assigning to low performance class: %+v", err)) | ||
return | ||
} | ||
serviceAssignment[serviceName] = true | ||
} | ||
} | ||
|
||
func assignWorkload(ch_latency <-chan int64, serviceName string, wg *sync.WaitGroup) { | ||
defer wg.Done() | ||
|
||
ticker := time.NewTicker(1 * time.Minute) | ||
defer ticker.Stop() | ||
|
||
var records []int64 | ||
|
||
for { | ||
select { | ||
case record, ok := <-ch_latency: | ||
if !ok { | ||
// Channel is closed, process remaining data | ||
processLatencies(records, serviceName) | ||
return | ||
} | ||
records = append(records, record) | ||
case <-ticker.C: | ||
// Time to process the data | ||
processLatencies(records, serviceName) | ||
} | ||
} | ||
} | ||
|
||
func main() { | ||
file, err := os.Create("metrics2.csv") | ||
if err != nil { | ||
panic(err) | ||
} | ||
defer file.Close() | ||
|
||
writer := csv.NewWriter(file) | ||
defer writer.Flush() | ||
|
||
err = writer.Write(append([]string{"startTime", "endTime", "spinningLatency", "sleepingLatency"})) | ||
if err != nil { | ||
fmt.Printf("Error writing metrics to the CSV file: %v\n", err) | ||
} | ||
|
||
ch := make(chan []string) | ||
ch_latency_spinning := make(chan int64) | ||
ch_latency_sleeping := make(chan int64) | ||
|
||
var wg sync.WaitGroup | ||
wg.Add(3) | ||
go powermanager.WriteToCSV(writer, ch, &wg) | ||
go assignWorkload(ch_latency_spinning, "spinning-go", &wg) | ||
go assignWorkload(ch_latency_sleeping, "sleeping-go", &wg) | ||
|
||
now := time.Now() | ||
for time.Since(now) < (time.Minute * 5) { | ||
go powermanager.InvokeConcurrently(5, powermanager.SleepingURL, ch, ch_latency_spinning, ch_latency_sleeping, false) | ||
go powermanager.InvokeConcurrently(5, powermanager.SpinningURL, ch, ch_latency_spinning, ch_latency_sleeping, true) | ||
|
||
time.Sleep(1 * time.Second) // Wait for 1 second before invoking again | ||
} | ||
close(ch) | ||
close(ch_latency_spinning) | ||
close(ch_latency_sleeping) | ||
wg.Wait() | ||
|
||
err = writer.Write(append([]string{"-", "-", "-", "-"})) | ||
if err != nil { | ||
fmt.Printf("Error writing metrics to the CSV file: %v\n", err) | ||
} | ||
fmt.Println("done") | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
module github.com/vhive-serverless/vhive/examples/power_manager | ||
|
||
go 1.19 |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
package main | ||
|
||
import ( | ||
"encoding/csv" | ||
"fmt" | ||
"os" | ||
"sync" | ||
"time" | ||
|
||
powermanager "github.com/vhive-serverless/vhive/examples/power_manager" | ||
) | ||
|
||
func main() { | ||
file, err := os.Create("metrics1.csv") | ||
if err != nil { | ||
panic(err) | ||
} | ||
defer file.Close() | ||
|
||
writer := csv.NewWriter(file) | ||
defer writer.Flush() | ||
|
||
err = writer.Write(append([]string{"startTime", "endTime", "spinningLatency", "sleepingLatency"})) | ||
if err != nil { | ||
fmt.Printf("Error writing metrics to the CSV file: %v\n", err) | ||
} | ||
|
||
ch := make(chan []string) | ||
ch_latency_spinning := make(chan int64) | ||
ch_latency_sleeping := make(chan int64) | ||
|
||
var wg sync.WaitGroup | ||
wg.Add(3) | ||
go powermanager.WriteToCSV(writer, ch, &wg) | ||
|
||
frequencies := map[string]int64{ | ||
powermanager.LowFrequencyPowerProfile: 1200, | ||
powermanager.HighFrequencyPowerProfile: 2400, | ||
} // for 50/50, need to manually tune the frequency of the individual node | ||
for powerProfile, freq := range frequencies { | ||
err := powermanager.SetPowerProfileToNode(powerProfile, powermanager.Node1Name, freq) | ||
if err != nil { | ||
fmt.Printf(fmt.Sprintf("Error setting up power profile for node1: %+v", err)) | ||
} | ||
err = powermanager.SetPowerProfileToNode(powerProfile, powermanager.Node2Name, freq) | ||
if err != nil { | ||
fmt.Printf(fmt.Sprintf("Error setting up power profile for node2: %+v", err)) | ||
} | ||
|
||
now := time.Now() | ||
for time.Since(now) < (time.Minute * 5) { | ||
go powermanager.InvokeConcurrently(5, powermanager.SleepingURL, ch, ch_latency_spinning, ch_latency_sleeping, false) | ||
go powermanager.InvokeConcurrently(5, powermanager.SpinningURL, ch, ch_latency_spinning, ch_latency_sleeping, true) | ||
|
||
time.Sleep(1 * time.Second) // Wait for 1 second before invoking again | ||
} | ||
close(ch) | ||
close(ch_latency_spinning) | ||
close(ch_latency_sleeping) | ||
wg.Wait() | ||
|
||
err = writer.Write(append([]string{"-", "-", "-", "-"})) | ||
if err != nil { | ||
fmt.Printf("Error writing metrics to the CSV file: %v\n", err) | ||
} | ||
fmt.Println("done") | ||
} | ||
} |
Oops, something went wrong.