Merge pull request #93 from akash-network/shm-additions

update: add shm support for invenotry operator
akash-network · Apr 1, 2024 · 9d2c84c · 9d2c84c
2 parents 7804128 + a93c88b
commit 9d2c84c
Show file tree

Hide file tree

Showing 2 changed files with 253 additions and 3 deletions.
diff --git a/src/content/Docs/getting-started/stack-definition-language/index.md b/src/content/Docs/getting-started/stack-definition-language/index.md
@@ -19,6 +19,8 @@ A complete deployment has the following sections:
 - [persistent storage](/docs/network-features/persistent-storage/)
 - [gpu support](#gpu-support)
 - [stable payment](#stable-payment)
+- [shared memory (shm)](stack-definition-language.md#shared-memory-shm)
+
 
 An example deployment configuration can be found [here](https://github.com/akash-network/docs/tree/62714bb13cfde51ce6210dba626d7248847ba8c1/sdl/deployment.yaml).
 
@@ -208,7 +210,7 @@ This says that the 20 instances of the `web` service should be deployed to a dat
 
 GPUs can be added to your workload via inclusion the compute profile section. The placement of the GPU stanza can be viewed in the full compute profile example shown below.
 
-> _**NOTE**_ - currently the only accepted vendor is `nvidia` but others will be added soon
+> _**NOTE**_  - when declaring the GPU model - I.e. in this example `rtx4090` - ensure that the model name aligns with the conventions found in this [list](https://github.com/akash-network/provider-configs/blob/main/devices/pcie/gpus.json).
 
 ```
 profiles:
@@ -224,7 +226,7 @@ profiles:
           attributes:
             vendor:
               nvidia:
-                - model: 4090
+                - model: rtx4090
         storage:
           size: 1Gi
 
@@ -258,7 +260,7 @@ gpu:
   attributes:
     vendor:
       nvidia:
-        - model: 4090
+        - model: rtx4090
         - model: t4
 ```
 
@@ -328,3 +330,62 @@ Use of Stable Payments is supported in the Akash SDL and is declared in the plac
 #### Full GPU SDL Example&#x20;
 
 To view an example Stable Payment enabled SDL in full for greater context, review this [example](https://gist.github.com/chainzero/040d19bdb20d632009b8ae206fb548f5).
+
+## Shared Memory (SHM)
+
+A new storage class named `ram`  may be added to the SDL to enable shared memory access for multiple services running in the same container.&#x20;
+
+#### SHM Defintion
+
+> _**NOTE**_ - SHM must not be persistent. The SDL validations  will error if SHM is defined as persistent.&#x20;
+
+```
+profiles:
+  compute:
+    grafana:
+      resources:
+        cpu:
+          units: 1
+        memory:
+          size: 1Gi
+        storage:
+          - size: 512Mi
+          - name: data
+            size: 1Gi
+            attributes:
+              persistent: true
+              class: beta2
+          - name: shm
+            size: 1Gi
+            attributes:
+              class: ram
+```
+
+#### SHM Use
+
+Use the defined SHM profile within a service:
+
+```
+services:
+  web:
+    image: <docker image>
+    expose:
+      - port: 80
+        as: 80
+        http_options:
+          max_body_size: 2097152
+          next_cases:
+            - off
+        accept:
+          - hello.localhost
+        to:
+          - global: true
+    params:
+      storage:
+        shm:
+          mount: /dev/shm
+```
+
+#### Full SHM SDL Example
+
+To view an example SHM enabled SDL in full for greater context, review this[ example](https://gist.github.com/chainzero/0dea9f2e1c4241d2e4d490b37153ec86).
diff --git a/...t/Docs/providers/build-a-cloud-provider/shared-memory-(shm)-enablement/index.md b/...t/Docs/providers/build-a-cloud-provider/shared-memory-(shm)-enablement/index.md
@@ -0,0 +1,189 @@
+---
+categories: ["Providers"]
+tags: []
+weight: 2
+title: "Shared Memory(SHM) Support (Optional Step)"
+linkTitle: Shared Memory(SHM) Support (Optional Step)"
+---
+
+# Shared Memory (SHM) Enablement
+
+## Update Provider Configuration File
+
+Providers must be updated with attributes in order to bid on the SHM deplloyments.
+
+> _**NOTE**_ - in the Akash Provider build documentation a `provider.yaml` file was created and which stores provider attribute/other settings. In this section we will update that `provider.yaml` file with SHM related attributes. The remainder of the pre-existing file should be left unchanged.
+
+### Access Provider Configuration File
+
+* Steps included in this code block create the necessary `provider.yaml` file in the expected directory
+
+```
+cd ~
+
+cd provider
+
+vim provider.yaml
+```
+
+### **Update the Provider YAML File With SHM Attribute**
+
+* When the `provider.yaml` file update is complete look like the following example.
+
+```
+  - key: capabilities/storage/3/class
+    value: ram
+  - key: capabilities/storage/3/persistent
+    value: false
+```
+
+#### Example Provider Config File
+
+```
+
+---
+from: "$ACCOUNT_ADDRESS"
+key: "$(cat ~/key.pem | openssl base64 -A)"
+keysecret: "$(echo $KEY_PASSWORD | openssl base64 -A)"
+domain: "$DOMAIN"
+node: "$AKASH_NODE"
+withdrawalperiod: 12h
+attributes:
+  - key: host
+    value: akash
+  - key: tier
+    value: community
+  - key: capabilities/storage/3/class
+    value: ram
+  - key: capabilities/storage/3/persistent
+    value: false
+```
+
+## Update Provider Via Helm
+
+```
+
+helm upgrade --install akash-provider akash/provider -n akash-services -f provider.yaml \
+--set bidpricescript="$(cat /root/provider/price_script_generic.sh | openssl base64 -A)"
+```
+
+## Update the Inventory Operator with SHM Support
+
+> _**NOTE**_ - when your Akash Provider was initially installed a step was included to also install the Akash Inventory Operator. In this step we will make any necessary changes to the inventory operator for SHM support.
+
+### Helm Chart -  values.yaml file
+
+* The `values.yaml` file for the inventory operator defaults are as follows
+* To support SHM we must update the inventory operator to include SHM/ram class.  We will update the inventory operator with such support in the subsequent step.
+
+```
+# Default values for inventory-operator.
+# This is a YAML-formatted file.
+# Declare variables to be passed into your templates.
+
+image:
+  repository: ghcr.io/akash-network/provider
+  pullPolicy: IfNotPresent
+
+inventoryConfig:
+  # Allow users to specify cluster storage options
+  cluster_storage:
+    - default
+    - beta2
+  exclude:
+    nodes: []
+    node_storage: []
+```
+
+#### Update Cluster Storage Cluster Setting
+
+* Use this command to update the cluster storage settings with SHM support
+
+> NOTE - in the example we include the support of persistent storage type of `beta3` as well.  Adjust this section appropriately based on your provider's support of persistent storage.
+
+```
+helm upgrade --install inventory-operator akash/akash-inventory-operator -n akash-services --set inventoryConfig.cluster_storage[0]=default,inventoryConfig.cluster_storage[1]=beta3,inventoryConfig.cluster_storage[2]=ram
+```
+
+#### Expected Output
+
+```
+root@node1:~/helm-charts/charts# helm install inventory-operator akash/akash-inventory-operator -n akash-services
+
+NAME: inventory-operator
+LAST DEPLOYED: Thu May  5 18:15:57 2022
+NAMESPACE: akash-services
+STATUS: deployed
+REVISION: 1
+TEST SUITE: None
+```
+
+## Verify Health of Akash Provider
+
+Use the following command to verify the health of the Akash Provider and Hostname Operator pods
+
+```
+kubectl get pods -n akash-services
+```
+
+#### Example/Expected Output
+
+```
+root@node1:~/provider# kubectl get pods -n akash-services
+NAME                                       READY   STATUS    RESTARTS   AGE
+akash-hostname-operator-5c59757fcc-kt7dl   1/1     Running   0          17s
+akash-provider-0                           1/1     Running   0          59s
+```
+
+## Verify Provider Attributes On Chain
+
+* In this step we ensure that your updated Akash Provider Attributes have been updated on the blockchain.  Ensure that the GPU model related attributes are now in place via this step.
+
+> _**NOTE**_ - conduct this verification from your Kubernetes control plane node
+
+```
+# Ensure that a RPC node environment variable is present for query
+export AKASH_NODE=https://rpc.akashnet.net:443
+
+# Replace the provider address with your own value
+provider-services query provider get <provider-address>
+```
+
+#### Example/Expected Output
+
+<pre><code>provider-services query provider get akash1mtnuc449l0mckz4cevs835qg72nvqwlul5wzyf
+<strong>
+</strong><strong>attributes:
+</strong>- key: region
+  value: us-central
+- key: host
+  value: akash
+- key: tier
+  value: community
+- key: organization
+  value: akash test provider
+- key: capabilities/storage/3/class
+  value: ram
+- key: capabilities/storage/3/persistent
+  value: false
+host_uri: https://provider.akashtestprovider.xyz:8443
+info:
+  email: ""
+  website: ""
+owner: akash1mtnuc449l0mckz4cevs835qg72nvqwlul5wzyf
+</code></pre>
+
+## Verify Akash Provider Image
+
+Verify the Provider image is correct by running this command:
+
+```
+kubectl -n akash-services get pod akash-provider-0 -o yaml | grep image: | uniq -c
+```
+
+#### Expected/Example Output
+
+```
+root@node1:~/provider# kubectl -n akash-services get pod akash-provider-0 -o yaml | grep image: | uniq -c
+      4     image: ghcr.io/akash-network/provider:0.5.4
+```