Docs and RN updates (#3669)

* (Issue #3619) 'Runs archiving' doc * (Issue #3573) 'Container limits' doc * (Issue #3568) 'Compose a Dockerfile' doc * (Issue #3576) 'GPU statistics monitor' doc * (Issue #3602) 'Pod network consumption alert and restriction' doc
epam · Sep 2, 2024 · 4782ea6 · 4782ea6
1 parent 0c48195
commit 4782ea6
Show file tree

Hide file tree

Showing 81 changed files with 591 additions and 98 deletions.
diff --git a/docs/md/api/API_tutorials/JavaScript_example.md b/docs/md/api/API_tutorials/JavaScript_example.md
@@ -2,14 +2,14 @@
 
 The JavaScript example implementation of the [Usage scenario](API_tutorials.md) offers a Web form, that allows to specify the job input parameters and submit it into the Cloud Pipeline via the API.
 
-# Prerequisites
+## Prerequisites
 
 * NodeJS 10.15.3+
 * npm 6.4.1+
 
-# Setup the configuration
+## Setup the configuration
 
-Locate the JavaScript sample application at [js_example](../attachments/js_example/) or clone the the Cloud Pipeline repository
+Locate the JavaScript sample application at [js_example](https://github.com/epam/cloud-pipeline/tree/develop/docs/md/api/API_tutorials/attachments/js_example) or clone it from the Cloud Pipeline repository:
 
 ```bash
 git clone https://github.com/epam/cloud-pipeline
@@ -19,10 +19,9 @@ cd cloud-pipeline/docs/md/api/API_tutorials/attachments/js_example/
 Open the configuration file [config.js](attachments/js_example/src/api/config.js) and replace the following values:
 
 * `<host>` - set to the host of the Cloud Pipeline API
-* `<storage_id>` - set to the ID of the bucket, that is going to be used as a "working directory"
-FASTQ files and processing results will placed into this bucket
+* `<storage_id>` - set to the ID of the bucket, that is going to be used as a "working directory". FASTQ files and processing results will placed into this bucket
 
-# Start the application
+## Start the application
 
 ```bash
 # Install dependencies and start the app
@@ -31,14 +30,14 @@ npm install
 npm run start
 ```
 
-# Application description
+## Application description
 
 Once app is built and loaded in the web-browser one can perform the following operations:
 
 * Setup the cellranger parameters
-  * Set the location of the FASTQ files
-  * Choose the transcriptome
-  * Specify the "workdir", where the job will keep the results
+    * Set the location of the FASTQ files
+    * Choose the transcriptome
+    * Specify the "workdir", where the job will keep the results
 
 ![API_tutorials_01](attachments/png/API_tutorials_01.png)
 

diff --git a/docs/md/css/extra.js b/docs/md/css/extra.js
diff --git a/docs/md/manual/08_Manage_Data_Storage/8.8._Data_sharing.md b/docs/md/manual/08_Manage_Data_Storage/8.8._Data_sharing.md
@@ -12,7 +12,7 @@ Users can share data storages within a Cloud Platform for enabling of getting d
 
 > For the ability of getting data files by the external partners, users should be considered, that external partner has own CP account and R/W permissions for a storage.
 
-1. Start creating a new object storage (for more details see [here](8.1._Create_and_edit_storage.md#create-storage)), fill **Info** items.
+1. Start creating a new object storage (for more details see [here](8.1._Create_and_edit_storage.md#create-object-storage)), fill **Info** items.
 2. Set **Enable sharing**.
 3. Click **Create** button:  
     ![CP_DataSharing](attachments/DataSharing_01.png)

diff --git a/docs/md/manual/09_Manage_Cluster_nodes/9._Manage_Cluster_nodes.md b/docs/md/manual/09_Manage_Cluster_nodes/9._Manage_Cluster_nodes.md
@@ -8,9 +8,13 @@
     - [GENERAL INFO](#general-info)
     - [JOBS](#jobs)
     - [MONITOR](#monitor)
-        - [Filters](#filters)
-        - [Zoom and scroll](#zooming-and-scrolling-features)
-        - [Export data](#export-utilization-data)
+        - [General statistics](#general-statistics)
+            - [Filters](#filters)
+            - [Zoom and scroll](#zooming-and-scrolling-features)
+            - [Export data](#export-utilization-data)
+        - [GPU statistics](#gpu-statistics)
+            - [GPU Filters](#gpu-filters)
+            - [Zoom and scroll GPU statistics](#zoom-and-scroll-gpu-statistics)
 
 **_Note_**:  Nodes remain for the time that is already paid for, even if all runs at the node finished execution. So if you restart pipeline, new nodes will not be initialized saving time and money.
 
@@ -66,13 +70,21 @@ This tab allows seeing general info about the node, including:
 
 ### MONITOR
 
-"MONITOR" tab displays a dashboard with following diagrams:
+"MONITOR" tab displays dashboards with different charts of node characteristics.  
+This tab includes subtabs:
+
+- [**General statistics**](#general-statistics) - contains charts of general statistics that are available for any node
+- [**GPU statistics**](#gpu-statistics) - contains charts of statistics that are available for GPU nodes only
+
+#### General statistics
+
+This subtab includes following diagrams:
 
 | Diagram | Description |
 |---|---|
-| **CPU usage** | A diagram represents **CPU usage (cores) - time** graph. The usage is displayed in fractions according to left vertical axis. There are two lines (data-series) - the first displays `max` value of the CPU usage in each moment and the second shows the `average` value of the usage during the time  |
-| **Memory usage** | A diagram represents **memory usage - time** graph. One type of graph represents usage in MB according to left vertical axis (includes two lines - first shows the `max` value in each moment and another shows the `average` values during the time). Another type of graph represents usage in % of available amounts of memory according to right vertical axis (analogically, includes two lines - `max` and `average`). |
-| **Network connection speed** | A diagram represents **connection speed (bytes) - time** graph. **Blue** graph (**TX**) represents "transmit" speed. **Red** graph (**RX**) represents "receive" speed. Drop-down at the top of the section allows changing connection protocol. |
+| **CPU usage** | A diagram represents **CPU usage (cores) - time** graph. The usage is displayed in fractions according to left vertical axis.<br>There are two lines (data-series) - the first displays `max` value of the CPU usage in each moment and the second shows the `average` value of the usage during the time. |
+| **Memory usage** | A diagram represents **memory usage - time** graph.<br>One type of graph represents usage in MB according to left vertical axis. It includes two lines - first shows the `max` value in each moment and another shows the `average` values during the time.<br>Another type of graph represents usage in % of available amounts of memory according to right vertical axis (analogically, includes two lines - `max` and `average`). |
+| **Network connection speed** | A diagram represents **connection speed (bytes) - time** graph. **Blue** graph (**TX**) represents "transmit" speed. **Red** graph (**RX**) represents "receive" speed.<br>Drop-down at the top of the section allows changing connection protocol. |
 | **File system load** | Represents all the disks of the machine and their loading. |
 
 ![CP_ManageClusterNodes](attachments/ManageClusterNodes_4.png)
@@ -95,7 +107,7 @@ To view the monitor of resources utilization for the completed run:
 > Please note, the resources utilization data for the completed run is available during **`system.resource.monitoring.stats.retention.period`** days.  
 > If you'll try to view monitor of the completed run after the specified period is over - the monitor will be empty.
 
-#### Filters
+##### Filters
 
 User can manage plots date configurations. For this purpose the system has number of filters:  
 ![CP_ManageClusterNodes](attachments/ManageClusterNodes_5.png)
@@ -140,7 +152,7 @@ If the user focuses on the calendar icon or the whole field at any of date field
 If click this button in the _Start_ date field - the node creation date will be substituted into that filter.  
 If click this button in the _End_ date field - the date in that field will be erased and the system will interpret it as the current datetime.
 
-#### Zooming and scrolling features
+##### Zooming and scrolling features
 
 In addition, user can _scroll_ plots.  
 To do so:
@@ -158,7 +170,7 @@ To zoom a chart:
     ![CP_ManageClusterNodes](attachments/ManageClusterNodes_10.png)  
     Click the desired button and the chart will be zoomed.
 
-#### Export utilization data
+##### Export utilization data
 
 Users have the ability to export the utilization information into a **`csv`** or **`xls`** file.  
 This can be useful, if the user wants to keep locally the information for a longer period of time than defined by the preference **`system.resource.monitoring.stats.retention.period`**.
@@ -198,3 +210,80 @@ Here:
 - once the settings are configured click the **EXPORT** button - the report will be prepared and downloaded
 
 > You also can export node usage report via **CLI**. See [here](../../manual/14_CLI/14.6._View_cluster_nodes_via_CLI.md#export-cluster-utilization).
+
+#### GPU statistics
+
+This subtab displays different characteristics of GPU cards utilization of the node for the selected time range (by default - from the node initialization till the current moment):  
+    ![CP_ManageClusterNodes](attachments/ManageClusterNodes_23.png)
+
+Dashboard includes 3 parts:
+
+- **Global GPU metrics** - header in the top of the subtab with metrics:
+    - **GPU utilization** - `mean`/`max`/`min` of all average GPU cards utilization for the selected node's run time range
+    - **GPU Memory utilization** - `mean`/`max`/`min` of all average GPU cards memory utilization for the selected node's run time range  
+    ![CP_ManageClusterNodes](attachments/ManageClusterNodes_24.png)
+- **Global chart** - line chart for the following metrics:
+    - **Time GPU Active** (_blue line_) - in each timepoint of the selected range, shows percentage of GPU cards which have GPU utilization more than 0
+    - **GPU Utilization** (_green line_) - in each timepoint of the selected range, shows `mean`/`max`/`min` GPU utilization (in percents) among all node's GPU cards
+    - **GPU Memory** (_red line_) - in each timepoint of the selected range, shows `mean`/`max`/`min` GPU memory utilization (in percents) among all node's GPU cards  
+    ![CP_ManageClusterNodes](attachments/ManageClusterNodes_25.png)
+    - When hovering over any point of this chart, a tooltip is shown with details:
+        - period of time that this point describes
+        - detailed GPU Utilization: each GPU card is shown as a separate rectangle with a value of that card's utilization. The higher the utilization value, the more saturated color of the rectangle is. Near the block, one aggregated value of GPU Utilization is shown - `mean`/`max`/`min` according to displaying settings.
+        - detailed GPU Memory Utilization: each GPU card is shown as a separate rectangle with a value of that card's memory utilization (in percents and Gb). The higher the utilization value, the more saturated color of the rectangle is. Near the block, one aggregated value of GPU Memory Utilization is shown - `mean`/`max`/`min` according to displaying settings.
+        - no matter at which line a point is selected - the tooltip for one time point will be the same for all line charts  
+        ![CP_ManageClusterNodes](attachments/ManageClusterNodes_26.png)
+- **Detailed heatmap** - shows **Time GPU Active**, **GPU Utilization** and **GPU Memory** metrics as heatmap at each time point:
+    - heatmap is divided to blocks vertically where each block presents a single metric
+    - in each heatmap block, one GPU card is shown by one row. Therefore, the whole number of heatmap rows equals `3 * <GPU_cards>`, where `<GPU_cards>` - nuber of GPU cards of the node
+    - each heatmap cell (at the intersection of a row - GPU card ID, and a column - time point) is shown as a separate rectangle colorized according to the metric value in that timepoint - with a saturation gradient - the higher the metric value, the more saturated color of the rectangle is. If the metric value is 0%, rectangle is not shown
+    ![CP_ManageClusterNodes](attachments/ManageClusterNodes_27.png)
+    - when hovering over any point of this heatmap, a tooltip similar to the one for line charts is shown. The one difference - rectangle of the selected GPU card has the bold color border:  
+    ![CP_ManageClusterNodes](attachments/ManageClusterNodes_28.png)
+
+The historical GPU resources utilization is also available for completed runs (during the specified time storage period).  
+It can be useful for debugging/optimization purposes and looks similar to the GPU resources utilization dashboard of the active run, e.g.:  
+    ![CP_ManageClusterNodes](attachments/ManageClusterNodes_32.png)
+
+> Please note, the resources utilization data for the completed run is available during **`system.resource.monitoring.stats.retention.period`** days.  
+> If you'll try to view monitor of the completed run after the specified period is over - the monitor will be empty.
+
+##### GPU Filters
+
+The displayed data at the GPU statistics dashboard can be configured. For this purpose there is a number of filters:  
+    ![CP_ManageClusterNodes](attachments/ManageClusterNodes_29.png)
+
+**Measure (1)**  
+Dropdown-list that allows to select how metrics will be calculated at the line chart and heatmap:
+
+- `Average` (default) - value in each point is calculated as _mean_ among all node's GPU cards values of that metric
+- `Min` - value in each point is calculated as _minimum_ among all node's GPU cards values of that metric
+- `Max` - value in each point is calculated as _maximum_ among all node's GPU cards values of that metric
+
+For example, line chart when **Measure** is `Max`:  
+    ![CP_ManageClusterNodes](attachments/ManageClusterNodes_30.png)
+
+**_Note_**: this filter affects the displaying of **GPU Utilization**, **GPU Memory** metrics at **Global GPU metrics** header, **Global chart** and **Detailed heatmap**.
+
+**Set range (2)**  
+Filter allows to select the time range for which metrics shall be calculated and shown on charts:
+
+- Last week
+- Last day
+- Last hour
+
+**Date filter (3)**  
+Filter allows to specify the _Start_ and the _End_ dates for charts.  
+By default, the _Start_ date (the left field of the filter) is the node creating datetime, the _End_ date (the right field of the filter) is the current datetime.  
+Charts are being automatically redrawn in case when a new range is set (if applicable).
+
+##### Zoom and scroll GPU statistics
+
+To _scroll_ over charts time range: focus on the chart/heatmap, hold the left mouse button and move the mouse in the desired direction (left or right). All charts will be redrawn simultaneously.
+
+To _zoom_ over charts time range:
+
+- focus on the chart/heatmap, then holding the Shift key, scroll via mouse. Time range will be automatically zoomed - and charts will be redrawn according to the new selected time range
+- _OR_ focus on the chart, then holding the Shift key, select the area on the chart via mouse. The area will be highlighted:  
+    ![CP_ManageClusterNodes](attachments/ManageClusterNodes_31.png)  
+    Then release the Shift key and the highlighted area will be automatically zoomed according to the new selected time range.
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_13.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_13.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_14.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_14.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_15.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_15.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_23.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_23.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_24.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_24.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_25.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_25.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_26.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_26.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_27.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_27.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_28.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_28.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_29.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_29.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_30.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_30.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_31.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_31.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_32.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_32.png
diff --git a/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_4.png b/docs/md/manual/09_Manage_Cluster_nodes/attachments/ManageClusterNodes_4.png
diff --git a/docs/md/manual/10_Manage_Tools/10.4._Edit_a_Tool.md b/docs/md/manual/10_Manage_Tools/10.4._Edit_a_Tool.md
@@ -4,6 +4,9 @@
 - [Edit a Tool version](#edit-a-tool-version)
     - [Run/Delete a Tool version](#rundelete-a-tool-version)
     - [Commit a Tool](#commit-a-tool)
+        - [Committing features](#committing-features)
+            - [Pre/post-commit hooks](#prepost-commit-hooks)
+            - [Container size limits](#container-size-limits)
 - [Edit a Tool settings](#edit-a-tool-settings)
 - [Delete a Tool](#delete-a-tool)
 
@@ -59,6 +62,8 @@ When it is complete COMMITING status on the right side of the screen will change
 
 #### Committing features
 
+##### Pre/post-commit hooks
+
 In certain use-cases, extra steps shall be executed before/after running the commit command in the container. For example, to avoid warning messages about terminating the previous session (which was committed) of the tool application in a non-graceful manner. Some applications may require extra cleanup to be performed before the termination.
 
 To workaround such issues in the **Cloud Pipeline** an approach of "**pre/post-commit hooks**" is implemented. That allows to perform some graceful cleanup/restore before/after performing the commit itself.
@@ -97,6 +102,38 @@ Consider an example with **RStudio** tool, that **Cloud Pipeline** provides "out
     - (**2**) post-commit script is found at the specified path in the docker image - and it is being executed
     - (**3**) post-commit script was performed successfully
 
+##### Container size limits
+
+Docker images can be extremely large. Therefore, **Cloud Pipeline** supports a mechanism to warn/reject users from creation of such big images.
+
+To configure that mechanism, there is the special [system preference](../12_Manage_Settings/12.10._Manage_system-level_settings.md#commit) - **`commit.container.size.limits`**.  
+This preference has a format:
+
+```
+{
+    "soft": <soft_limit_size>,
+    "hard": <hard_limit_size>
+}
+```
+
+This preference defines "**soft**" and "**hard**" limits for a container size in bytes:
+
+- if user tries to perform a commit operation and the container size exceeds "soft" limit - user will get warning notification, but can proceed the commit at their own risk
+- if the container size exceeds "hard" limit - commit operation in this case will be unavailable
+- if the size of any limit is set as `0` - this means there is no limitation of that type
+
+Example of the preference configuration:  
+    ![CP_EditTool](attachments/EditTool_25.png)  
+In this example, "**soft**" limit is set as 1 GB and there is no "**hard**" limit.
+
+In case, when "**soft**" limit is set more than `0` and user tries to commit some tool which container exceeds this limit, the following warning will appear:  
+    ![CP_EditTool](attachments/EditTool_26.png)  
+At the same time, commit operation is available.
+
+In case, when "**hard**" limit is set more than `0` and user tries to commit some tool which container exceeds this limit, the following error will appear:  
+    ![CP_EditTool](attachments/EditTool_27.png)  
+Commit operation is unavailable.
+
 ## Edit a Tool settings
 
 Settings in this tab are applied to all Tool versions (i.e. these settings will be a default for all Tool version).