From 6081e912db4ba895c47e707cbeba0e7afc11d3e3 Mon Sep 17 00:00:00 2001
From: Jonathan Halverson <52128661+jdh4@users.noreply.github.com>
Date: Sun, 3 Nov 2024 11:53:54 -0500
Subject: [PATCH] Update index.md

---
 docs/index.md | 107 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 100 insertions(+), 7 deletions(-)

diff --git a/docs/index.md b/docs/index.md
index 61a8125..bf7d453 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -2,13 +2,106 @@
 
 Jobstats is a free and open-source job monitoring platform designed for CPU and GPU clusters that use the Slurm workload manager. It was released in 2023 under the GNU GPL v2 license.
 
-- GPU utilization
-- accurate CPU memory for multinode jobs
+## What are the main benefits of Jobstats over other platforms?
+
+The main advantages of Jobstats are:
+
+- GPU utilization and memory usage for each allocated GPu
 - automatically cancel jobs with 0% GPU utilization
-- guide user with custom job notes
-- Grafana dashboard
-- works with Open OnDemand
+- accurate CPU memory usage for single and multi-node jobs
+- graphical interface for inspecting job metrics versus time
+- custom job efficiency emails with job-specific notes
+- automated emails to users for instances of underutilization
+- periodic reports on usage and efficiency for users and group leaders
+- all of the above features work with Open OnDemand jobs
+
+## How does Jobstats work?
+
+Jobstats is composed of data exporters, Prometheus database, Grafana visualization interface, and the Slurm database. Measurements made on the compute nodes are stored in the time-series Prometheus database. Job efficiency reports are generate from this data and Slurm.
+
+## Which institutions are using Jobstats?
+
+Jobstats is used by these institutions:
+
+- Brown University - Center for Computation and Visualization
+- Free University of Berlin - High-Performance Computing
+- Princeton University - Computer Science Department
+- Princeton University - Research Computing
+- Yale University - Center for Research Computing
+- and many more
+
+## What does a Jobstats efficiency report look like?
+
+The `jobstats` command generates a job report:
+
+```
+$ jobstats 39798795
+
+================================================================================
+                              Slurm Job Statistics
+================================================================================
+         Job ID: 39798795
+  NetID/Account: aturing/math
+       Job Name: sys_logic_ordinals
+          State: COMPLETED
+          Nodes: 2
+      CPU Cores: 48
+     CPU Memory: 256GB (5.3GB per CPU-core)
+           GPUs: 4
+  QOS/Partition: della-gpu/gpu
+        Cluster: della
+     Start Time: Fri Mar 4, 2022 at 1:56 AM
+       Run Time: 18:41:56
+     Time Limit: 4-00:00:00
+
+                              Overall Utilization
+================================================================================
+  CPU utilization  [|||||                                          10%]
+  CPU memory usage [|||                                             6%]
+  GPU utilization  [||||||||||||||||||||||||||||||||||             68%]
+  GPU memory usage [|||||||||||||||||||||||||||||||||              66%]
+
+                              Detailed Utilization
+================================================================================
+  CPU utilization per node (CPU time used/run time)
+      della-i14g2: 1-21:41:20/18-16:46:24 (efficiency=10.2%)
+      della-i14g3: 1-18:48:55/18-16:46:24 (efficiency=9.5%)
+  Total used/runtime: 3-16:30:16/37-09:32:48, efficiency=9.9%
+
+  CPU memory usage per node - used/allocated
+      della-i14g2: 7.9GB/128.0GB (335.5MB/5.3GB per core of 24)
+      della-i14g3: 7.8GB/128.0GB (334.6MB/5.3GB per core of 24)
+  Total used/allocated: 15.7GB/256.0GB (335.1MB/5.3GB per core of 48)
+
+  GPU utilization per node
+      della-i14g2 (GPU 0): 65.7%
+      della-i14g2 (GPU 1): 64.5%
+      della-i14g3 (GPU 0): 72.9%
+      della-i14g3 (GPU 1): 67.5%
+
+  GPU memory usage per node - maximum used/total
+      della-i14g2 (GPU 0): 26.5GB/40.0GB (66.2%)
+      della-i14g2 (GPU 1): 26.5GB/40.0GB (66.2%)
+      della-i14g3 (GPU 0): 26.5GB/40.0GB (66.2%)
+      della-i14g3 (GPU 1): 26.5GB/40.0GB (66.2%)
+
+                                     Notes
+================================================================================
+  * This job only used 6% of the 256GB of total allocated CPU memory. For
+    future jobs, please allocate less memory by using a Slurm directive such
+    as --mem-per-cpu=1G or --mem=10G. This will reduce your queue times and
+    make the resources available to other users. For more info:
+      https://researchcomputing.princeton.edu/support/knowledge-base/memory
+
+  * For additional job metrics including metrics plotted against time:
+    https://mydella.princeton.edu/pun/sys/jobstats  (VPN required off-campus)
+```
+
+## Other Job Monitoring Platforms
 
-# Comparison to Other Platforms
+Consider these alternatives to Jobstats:
 
-It is most similar to Open XDMod.
+- [XDMod (SUPReMM)](https://supremm.xdmod.org/7.0/supremm-architecture.html)
+- [LLload](https://dl.acm.org/doi/10.1145/3626203.3670565)
+- [TACC Stats](https://tacc.utexas.edu/research/tacc-research/tacc-stats/)
+- [REMORA](https://docs.tacc.utexas.edu/software/remora/)