Skip to content

Commit

Permalink
Merge pull request #93 from akash-network/shm-additions
Browse files Browse the repository at this point in the history
update: add shm support for invenotry operator
  • Loading branch information
chainzero authored Apr 1, 2024
2 parents 7804128 + a93c88b commit 9d2c84c
Show file tree
Hide file tree
Showing 2 changed files with 253 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ A complete deployment has the following sections:
- [persistent storage](/docs/network-features/persistent-storage/)
- [gpu support](#gpu-support)
- [stable payment](#stable-payment)
- [shared memory (shm)](stack-definition-language.md#shared-memory-shm)


An example deployment configuration can be found [here](https://github.com/akash-network/docs/tree/62714bb13cfde51ce6210dba626d7248847ba8c1/sdl/deployment.yaml).

Expand Down Expand Up @@ -208,7 +210,7 @@ This says that the 20 instances of the `web` service should be deployed to a dat

GPUs can be added to your workload via inclusion the compute profile section. The placement of the GPU stanza can be viewed in the full compute profile example shown below.

> _**NOTE**_ - currently the only accepted vendor is `nvidia` but others will be added soon
> _**NOTE**_ - when declaring the GPU model - I.e. in this example `rtx4090` - ensure that the model name aligns with the conventions found in this [list](https://github.com/akash-network/provider-configs/blob/main/devices/pcie/gpus.json).

```
profiles:
Expand All @@ -224,7 +226,7 @@ profiles:
attributes:
vendor:
nvidia:
- model: 4090
- model: rtx4090
storage:
size: 1Gi

Expand Down Expand Up @@ -258,7 +260,7 @@ gpu:
attributes:
vendor:
nvidia:
- model: 4090
- model: rtx4090
- model: t4
```
Expand Down Expand Up @@ -328,3 +330,62 @@ Use of Stable Payments is supported in the Akash SDL and is declared in the plac
#### Full GPU SDL Example 
To view an example Stable Payment enabled SDL in full for greater context, review this [example](https://gist.github.com/chainzero/040d19bdb20d632009b8ae206fb548f5).
## Shared Memory (SHM)
A new storage class named `ram` may be added to the SDL to enable shared memory access for multiple services running in the same container. 
#### SHM Defintion
> _**NOTE**_ - SHM must not be persistent. The SDL validations will error if SHM is defined as persistent. 
```
profiles:
compute:
grafana:
resources:
cpu:
units: 1
memory:
size: 1Gi
storage:
- size: 512Mi
- name: data
size: 1Gi
attributes:
persistent: true
class: beta2
- name: shm
size: 1Gi
attributes:
class: ram
```
#### SHM Use
Use the defined SHM profile within a service:
```
services:
web:
image: <docker image>
expose:
- port: 80
as: 80
http_options:
max_body_size: 2097152
next_cases:
- off
accept:
- hello.localhost
to:
- global: true
params:
storage:
shm:
mount: /dev/shm
```
#### Full SHM SDL Example
To view an example SHM enabled SDL in full for greater context, review this[ example](https://gist.github.com/chainzero/0dea9f2e1c4241d2e4d490b37153ec86).
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
---
categories: ["Providers"]
tags: []
weight: 2
title: "Shared Memory(SHM) Support (Optional Step)"
linkTitle: Shared Memory(SHM) Support (Optional Step)"
---

# Shared Memory (SHM) Enablement

## Update Provider Configuration File

Providers must be updated with attributes in order to bid on the SHM deplloyments.

> _**NOTE**_ - in the Akash Provider build documentation a `provider.yaml` file was created and which stores provider attribute/other settings. In this section we will update that `provider.yaml` file with SHM related attributes. The remainder of the pre-existing file should be left unchanged.
### Access Provider Configuration File

* Steps included in this code block create the necessary `provider.yaml` file in the expected directory

```
cd ~
cd provider
vim provider.yaml
```

### **Update the Provider YAML File With SHM Attribute**

* When the `provider.yaml` file update is complete look like the following example.

```
- key: capabilities/storage/3/class
value: ram
- key: capabilities/storage/3/persistent
value: false
```

#### Example Provider Config File

```
---
from: "$ACCOUNT_ADDRESS"
key: "$(cat ~/key.pem | openssl base64 -A)"
keysecret: "$(echo $KEY_PASSWORD | openssl base64 -A)"
domain: "$DOMAIN"
node: "$AKASH_NODE"
withdrawalperiod: 12h
attributes:
- key: host
value: akash
- key: tier
value: community
- key: capabilities/storage/3/class
value: ram
- key: capabilities/storage/3/persistent
value: false
```

## Update Provider Via Helm

```
helm upgrade --install akash-provider akash/provider -n akash-services -f provider.yaml \
--set bidpricescript="$(cat /root/provider/price_script_generic.sh | openssl base64 -A)"
```

## Update the Inventory Operator with SHM Support

> _**NOTE**_ - when your Akash Provider was initially installed a step was included to also install the Akash Inventory Operator. In this step we will make any necessary changes to the inventory operator for SHM support.
### Helm Chart - values.yaml file

* The `values.yaml` file for the inventory operator defaults are as follows
* To support SHM we must update the inventory operator to include SHM/ram class. We will update the inventory operator with such support in the subsequent step.

```
# Default values for inventory-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
image:
repository: ghcr.io/akash-network/provider
pullPolicy: IfNotPresent
inventoryConfig:
# Allow users to specify cluster storage options
cluster_storage:
- default
- beta2
exclude:
nodes: []
node_storage: []
```

#### Update Cluster Storage Cluster Setting

* Use this command to update the cluster storage settings with SHM support

> NOTE - in the example we include the support of persistent storage type of `beta3` as well. Adjust this section appropriately based on your provider's support of persistent storage.
```
helm upgrade --install inventory-operator akash/akash-inventory-operator -n akash-services --set inventoryConfig.cluster_storage[0]=default,inventoryConfig.cluster_storage[1]=beta3,inventoryConfig.cluster_storage[2]=ram
```

#### Expected Output

```
root@node1:~/helm-charts/charts# helm install inventory-operator akash/akash-inventory-operator -n akash-services
NAME: inventory-operator
LAST DEPLOYED: Thu May 5 18:15:57 2022
NAMESPACE: akash-services
STATUS: deployed
REVISION: 1
TEST SUITE: None
```

## Verify Health of Akash Provider

Use the following command to verify the health of the Akash Provider and Hostname Operator pods

```
kubectl get pods -n akash-services
```

#### Example/Expected Output

```
root@node1:~/provider# kubectl get pods -n akash-services
NAME READY STATUS RESTARTS AGE
akash-hostname-operator-5c59757fcc-kt7dl 1/1 Running 0 17s
akash-provider-0 1/1 Running 0 59s
```

## Verify Provider Attributes On Chain

* In this step we ensure that your updated Akash Provider Attributes have been updated on the blockchain. Ensure that the GPU model related attributes are now in place via this step.

> _**NOTE**_ - conduct this verification from your Kubernetes control plane node
```
# Ensure that a RPC node environment variable is present for query
export AKASH_NODE=https://rpc.akashnet.net:443
# Replace the provider address with your own value
provider-services query provider get <provider-address>
```

#### Example/Expected Output

<pre><code>provider-services query provider get akash1mtnuc449l0mckz4cevs835qg72nvqwlul5wzyf
<strong>
</strong><strong>attributes:
</strong>- key: region
value: us-central
- key: host
value: akash
- key: tier
value: community
- key: organization
value: akash test provider
- key: capabilities/storage/3/class
value: ram
- key: capabilities/storage/3/persistent
value: false
host_uri: https://provider.akashtestprovider.xyz:8443
info:
email: ""
website: ""
owner: akash1mtnuc449l0mckz4cevs835qg72nvqwlul5wzyf
</code></pre>

## Verify Akash Provider Image

Verify the Provider image is correct by running this command:

```
kubectl -n akash-services get pod akash-provider-0 -o yaml | grep image: | uniq -c
```

#### Expected/Example Output

```
root@node1:~/provider# kubectl -n akash-services get pod akash-provider-0 -o yaml | grep image: | uniq -c
4 image: ghcr.io/akash-network/provider:0.5.4
```

0 comments on commit 9d2c84c

Please sign in to comment.