-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updating documentation with name change
Signed-off-by: Karl W Schulz <[email protected]>
- Loading branch information
Showing
5 changed files
with
78 additions
and
78 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,10 +8,10 @@ | |
|
||
## System-wide deployment | ||
|
||
There are different ways to deploy and install Omniwatch in a data center, and | ||
There are different ways to deploy and install Omnistat in a data center, and | ||
each system will generally require a certain level of customization. This | ||
section first describes the basic manual steps to install the Omniwatch client | ||
and server, and then provides an example of how to deploy Omniwatch in a data | ||
section first describes the basic manual steps to install the Omnistat client | ||
and server, and then provides an example of how to deploy Omnistat in a data | ||
center using Ansible. | ||
|
||
### Node-level deployment (client) | ||
|
@@ -24,72 +24,72 @@ as a package. | |
|
||
1. Clone repository. | ||
``` | ||
$ git clone https://github.com/AMDResearch/omniwatch.git | ||
$ git clone https://github.com/AMDResearch/omnistat.git | ||
``` | ||
|
||
2. Install dependencies. | ||
``` | ||
$ cd omniwatch | ||
$ cd omnistat | ||
$ pip install --user -r requirements.txt | ||
``` | ||
|
||
3. Launch client with `gunicorn`. Needs to be executed from the root | ||
directory of the Omniwatch project. | ||
directory of the Omnistat project. | ||
``` | ||
$ gunicorn -b 0.0.0.0:8000 "omniwatch.node_monitoring:app" | ||
$ gunicorn -b 0.0.0.0:8000 "omnistat.node_monitoring:app" | ||
``` | ||
|
||
#### Option B. Install package | ||
|
||
1. Clone repository. | ||
``` | ||
$ git clone https://github.com/AMDResearch/omniwatch.git | ||
$ git clone https://github.com/AMDResearch/omnistat.git | ||
``` | ||
|
||
2. Create a virtual environment, with Python 3.8, 3.9, or 3.10. | ||
``` | ||
$ cd omniwatch | ||
$ python -m venv /opt/omniwatch | ||
$ cd omnistat | ||
$ python -m venv /opt/omnistat | ||
``` | ||
|
||
3. Install omniwatch in a virtual environment. The virtual environment can | ||
also be used by sourcing the `./opt/omniwatch/bin/activate` file, and that | ||
3. Install omnistat in a virtual environment. The virtual environment can | ||
also be used by sourcing the `./opt/omnistat/bin/activate` file, and that | ||
way there is no need to keep using the complete `./venv/bin` path every | ||
time. This guide uses the complete path for clarity. Needs to be | ||
executed from the root directory of the Omniwatch repository. | ||
executed from the root directory of the Omnistat repository. | ||
``` | ||
$ /opt/omniwatch/bin/python -m pip install . | ||
$ /opt/omnistat/bin/python -m pip install . | ||
``` | ||
Alternatively, use the following line to install Omniwatch with the | ||
optional dependencies for the `omniwatch-query` tool. | ||
Alternatively, use the following line to install Omnistat with the | ||
optional dependencies for the `omnistat-query` tool. | ||
``` | ||
$ /opt/omniwatch/bin/python -m pip install .[query] | ||
$ /opt/omnistat/bin/python -m pip install .[query] | ||
``` | ||
|
||
4. Launch the client with `gunicorn`. To make sure the installed version of | ||
Omniwatch is being used, this shouldn't be executed from the root directory | ||
Omnistat is being used, this shouldn't be executed from the root directory | ||
of the project. | ||
``` | ||
$ /opt/omniwatch/bin/gunicorn -b 0.0.0.0:8000 "omniwatch.node_monitoring:app" | ||
$ /opt/omnistat/bin/gunicorn -b 0.0.0.0:8000 "omnistat.node_monitoring:app" | ||
``` | ||
|
||
#### Configure client | ||
|
||
Launching the Omniwatch client as described above will load the default | ||
Launching the Omnistat client as described above will load the default | ||
configuration options. To use a different configuration file, use the | ||
`OMNIWATCH_CONFIG` environment variable. | ||
`OMNISTAT_CONFIG` environment variable. | ||
``` | ||
$ OMNIWATCH_CONFIG=/path/to/config/file gunicorn -b 0.0.0.0:8000 "omniwatch.node_monitoring:app" | ||
$ OMNISTAT_CONFIG=/path/to/config/file gunicorn -b 0.0.0.0:8000 "omnistat.node_monitoring:app" | ||
``` | ||
|
||
A [sample configuration | ||
file](https://github.com/AMDResearch/omniwatch/blob/main/omniwatch.default) is | ||
file](https://github.com/AMDResearch/omnistat/blob/main/omnistat.default) is | ||
available in the respository. | ||
|
||
#### Check installation | ||
|
||
As a sanity check, this is the expected output you should see when launching | ||
the Omniwatch client: | ||
the Omnistat client: | ||
``` | ||
[2024-06-08 18:50:56 -0400] [5834] [INFO] Starting gunicorn 21.2.0 | ||
[2024-06-08 18:50:56 -0400] [5834] [INFO] Listening at: http://0.0.0.0:8000 (5834) | ||
|
@@ -125,17 +125,17 @@ card0_rocm_utilization 0.0 | |
|
||
#### Enable systemd service | ||
|
||
To run the Omniwatch client permanently on a host, configure the service via | ||
To run the Omnistat client permanently on a host, configure the service via | ||
systemd. An [example service | ||
file](https://github.com/AMDResearch/omniwatch/blob/main/omniwatch.service) is | ||
file](https://github.com/AMDResearch/omnistat/blob/main/omnistat.service) is | ||
available in the repository, including the following key lines: | ||
``` | ||
Environment="OMNIWATCH_CONFIG=/etc/omniwatch/config" | ||
Environment="OMNIWATCH_PORT=8000" | ||
ExecStart=/opt/omniwatch/bin/gunicorn -b 0.0.0.0:${OMNIWATCH_PORT} "omniwatch.node_monitoring:app" | ||
Environment="OMNISTAT_CONFIG=/etc/omnistat/config" | ||
Environment="OMNISTAT_PORT=8000" | ||
ExecStart=/opt/omnistat/bin/gunicorn -b 0.0.0.0:${OMNISTAT_PORT} "omnistat.node_monitoring:app" | ||
``` | ||
Please set `OMNIWATCH_CONFIG` and `OMNIWATCH_PORT` as needed depending on how | ||
Omniwatch is installed. | ||
Please set `OMNISTAT_CONFIG` and `OMNISTAT_PORT` as needed depending on how | ||
Omnistat is installed. | ||
|
||
### Prometheus installation and configuration (server) | ||
|
||
|
@@ -159,7 +159,7 @@ On a separate server with access to compute nodes, install and configure | |
which nodes to poll and at what frequency. For example: | ||
``` | ||
scrape_configs: | ||
- job_name: "omniwatch" | ||
- job_name: "omnistat" | ||
scrape_interval: 30s | ||
scrape_timeout: 5s | ||
static_configs: | ||
|
@@ -173,83 +173,83 @@ On a separate server with access to compute nodes, install and configure | |
### Ansible example | ||
|
||
For a cluster or data center deployment, management tools like Ansible may be | ||
used to install Omniwatch. | ||
used to install Omnistat. | ||
|
||
The following Ansible playbook will fetch the Omniwatch repository in each | ||
node, create a virtual environment for Omniwatch under `/opt/omniwatch`, | ||
install a configuration file under `/etc/omniwatch`, and enable Omniwatch as a | ||
The following Ansible playbook will fetch the Omnistat repository in each | ||
node, create a virtual environment for Omnistat under `/opt/omnistat`, | ||
install a configuration file under `/etc/omnistat`, and enable Omnistat as a | ||
systemd service. This is only an example and will likely need to be adapted | ||
depending on the characteristics and scale of the system. | ||
|
||
``` | ||
- hosts: all | ||
vars: | ||
- omniwatch_url: [email protected]:AMDResearch/omniwatch.git | ||
- omniwatch_tmp: /tmp/omniwatch-install | ||
- omniwatch_dir: /opt/omniwatch | ||
- omnistat_url: [email protected]:AMDResearch/omnistat.git | ||
- omnistat_tmp: /tmp/omnistat-install | ||
- omnistat_dir: /opt/omnistat | ||
tasks: | ||
- name: Fetch copy of omniwatch repository for installation | ||
- name: Fetch copy of omnistat repository for installation | ||
git: | ||
repo: "{{ omniwatch_url }}" | ||
dest: "{{ omniwatch_tmp }}" | ||
repo: "{{ omnistat_url }}" | ||
dest: "{{ omnistat_tmp }}" | ||
version: jorda/python-package | ||
single_branch: true | ||
- name: Install omniwatch in virtual environment | ||
- name: Install omnistat in virtual environment | ||
pip: | ||
name: "{{ omniwatch_tmp }}[query]" | ||
virtualenv: "{{ omniwatch_dir }}" | ||
name: "{{ omnistat_tmp }}[query]" | ||
virtualenv: "{{ omnistat_dir }}" | ||
virtualenv_command: /usr/bin/python3 -m venv | ||
- name: Create configuration directory | ||
file: | ||
path: /etc/omniwatch | ||
path: /etc/omnistat | ||
state: directory | ||
mode: "0755" | ||
- name: Copy configuration file | ||
copy: | ||
remote_src: true | ||
src: "{{ omniwatch_tmp }}/omniwatch/config/omniwatch.default" | ||
dest: /etc/omniwatch/config | ||
src: "{{ omnistat_tmp }}/omnistat/config/omnistat.default" | ||
dest: /etc/omnistat/config | ||
mode: "0644" | ||
- name: Copy service file | ||
copy: | ||
remote_src: true | ||
src: "{{ omniwatch_tmp }}/omniwatch.service" | ||
src: "{{ omnistat_tmp }}/omnistat.service" | ||
dest: /etc/systemd/system | ||
mode: "0644" | ||
- name: Enable service | ||
service: | ||
name: omniwatch | ||
name: omnistat | ||
enabled: yes | ||
state: started | ||
- name: Delete temporary installation files | ||
file: | ||
path: "{{ omniwatch_tmp }}" | ||
path: "{{ omnistat_tmp }}" | ||
state: absent | ||
``` | ||
|
||
--- | ||
|
||
## User-mode execution with SLURM | ||
|
||
### Installing Omniwatch | ||
### Installing Omnistat | ||
|
||
1. Create a virtual environment in a shared directory, with Python 3.8, 3.9, | ||
or 3.10. | ||
``` | ||
$ python -m venv ~/omniwatch | ||
$ python -m venv ~/omnistat | ||
``` | ||
|
||
2. From to root directory of the Omniwatch repository, install omniwatch in | ||
2. From to root directory of the Omnistat repository, install omnistat in | ||
the virtual environment. | ||
``` | ||
$ ~/omniwatch/bin/python -m pip install .[query] | ||
$ ~/omnistat/bin/python -m pip install .[query] | ||
``` | ||
|
||
### Running a SLURM Job | ||
|
@@ -258,34 +258,34 @@ In the SLURM job script, add the following lines to start and stop the data | |
collection before and after running the application. | ||
|
||
``` | ||
export OMNIWATCH_CONFIG=~/omniwatch/omniwatch.config | ||
export OMNISTAT_CONFIG=~/omnistat/omnistat.config | ||
# Start data collector | ||
~/omniwatch/bin/omniwatch-util --start --interval 1 | ||
~/omnistat/bin/omnistat-util --start --interval 1 | ||
# Run application | ||
sleep 10 | ||
# Stop data collector | ||
~/omniwatch/bin/omniwatch-util --stop | ||
~/omnistat/bin/omnistat-util --stop | ||
# Query server to generate job report | ||
~/omniwatch/bin/omniwatch-util --startserver | ||
~/omniwatch/bin/omniwatch-util --job ${SLURM_JOB_ID} | ||
~/omniwatch/bin/omniwatch-util --stopserver | ||
~/omnistat/bin/omnistat-util --startserver | ||
~/omnistat/bin/omnistat-util --job ${SLURM_JOB_ID} | ||
~/omnistat/bin/omnistat-util --stopserver | ||
``` | ||
|
||
### Exploring results with a local Docker environment | ||
|
||
To explore results generated for user-mode executions of Omniwatch, we provide | ||
To explore results generated for user-mode executions of Omnistat, we provide | ||
a Docker environment that will automatically launch the required services | ||
locally. That includes Prometheus to read and query the stored data, and | ||
Grafana as visualization platform to display time series and other metrics. | ||
|
||
To explore results: | ||
|
||
1. Copy Prometheus data collected with Omniwatch to `./prometheus-data`. The | ||
entire `datadir` defined in the Omniwatch configuration needs to be copied | ||
1. Copy Prometheus data collected with Omnistat to `./prometheus-data`. The | ||
entire `datadir` defined in the Omnistat configuration needs to be copied | ||
(e.g. a `data` directory should be present under `./prometheus-data`). | ||
2. Start services: | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters