Improve docs (#109)

**Description** Improve usage page and fix some typos. **Major Revision** - Improve usage page - fix some typos
Azure · Oct 30, 2023 · 3a4ba14 · 3a4ba14
1 parent ceadfea
commit 3a4ba14
Show file tree

Hide file tree

Showing 3 changed files with 13 additions and 9 deletions.
diff --git a/docs/getting-started/run-msamp.md b/docs/getting-started/run-msamp.md
@@ -2,7 +2,7 @@
 id: run-msamp
 ---
 
-# Run examples
+# Run Examples
 
 After installing MS-AMP, you can run several simple examples using MS-AMP. Please note that before running these commands, you need to change work directory to [examples](https://github.com/Azure/MS-AMP/tree/main/examples).
 

diff --git a/docs/introduction.md b/docs/introduction.md
@@ -26,17 +26,17 @@ MS-AMP has the following benefit comparing with Transformer Engine:
 
 ### Model performance
 
-We evaluated the training loss and validation performance of four typical models, GPT-3, Swin-Transformer, DeiT and RoBERTa, using both MS-AMP and FP16 AMP/BF16. Our observations showed that the models trained with MS-AMP achieved comparable performance to those trained using FP16 AMP/BF16. This demonstrates the effectiveness of the mixed FP8 in MS-AMP.
+We evaluated the training loss and validation performance of four typical models, GPT-3, Swin-Transformer, DeiT and RoBERTa, using both MS-AMP and FP16/BF16 AMP. Our observations show that the models trained with MS-AMP achieved comparable performance to those trained using FP16/BF16 AMP. This demonstrates the effectiveness of the mixed FP8 in MS-AMP.
 
 Here are the results for GPT-3, Swin-T, DeiT-S and RoBERTa-B.
 
 ![image](./assets/gpt-loss.png)
 
 ![image](./assets/performance.png)
 
-### System peroformance
+### System performance
 
-MS-AMP preserves high-precision's accuracy while using only a fraction of the memory footprint on a range of tasks, including GPT-3, DeiT and Swin Transformer. For example, when training GPT-175B on NVIDIA H100 platform, MS-AMP achieves a notable 42% reduction in real memory usage compared with BF16 mixed-precision aproch and reduces training time by 17% compared with Transformer Engine. For small models, MS-AMP with O2 mode can achieve 44% memory saving for Swin-1.0B and 26% memory saving for ViT-1.2B, comparing with FP16 AMP.
+MS-AMP preserves high-precision's accuracy while using only a fraction of the memory footprint on a range of tasks, including GPT-3, DeiT and Swin Transformer. For example, when training GPT-175B on NVIDIA H100 platform, MS-AMP achieves a notable 42% reduction in real memory usage compared with BF16 mixed-precision approach and reduces training time by 17% compared with Transformer Engine. For small models, MS-AMP with O2 mode can achieve 44% memory saving for Swin-1.0B and 26% memory saving for ViT-1.2B, comparing with FP16 AMP.
 
 Here are the resuls for GPT-3:
 

diff --git a/docs/user-tutorial/usage.md b/docs/user-tutorial/usage.md
@@ -6,7 +6,7 @@ id: usage
 
 ## Basic usage
 
-Enabling MS-AMP is very simple when traning model w/ or w/o data parallelism on a single node, you only need to add one line of code `msamp.initialize(model, optimizer, opt_level)` after defining model and optimizer.
+Enabling MS-AMP is very simple when traning model w/o any distributed parallel technologies, you only need to add one line of code `msamp.initialize(model, optimizer, opt_level)` after defining model and optimizer.
 
 Example:
 
@@ -22,20 +22,24 @@ model, optimizer = msamp.initialize(model, optimizer, opt_level="O2")
 ...
 ```
 
-## Usage in distributed parallel training
+## Usage in DeepSpeed
 
 MS-AMP supports FP8 for distributed parallel training and has the capability of integrating with advanced distributed traning frameworks. We have integrated MS-AMP with several popular distributed training frameworks such as DeepSpeed, Megatron-DeepSpeed and Megatron-LM to demonstrate this capability.
 
-For enabling MS-AMP when using ZeRO in DeepSpeed, add one line of code `import msamp` and a "msamp" section in DeepSpeed config file:
+For enabling MS-AMP in DeepSpeed, add one line of code `from msamp import deepspeed` at the beginging and a "msamp" section in DeepSpeed config file:
 
 ```json
 "msamp": {
   "enabled": true,
-  "opt_level": "O3"
+  "opt_level": "O1|O2|O3"
 }
 ```
 
-For applying MS-AMP to Megatron-DeepSpeed and Megatron-LM, you need to do very little code change for applying it. Here is the instruction of applying MS-AMP for running [gpt-3](https://github.com/Azure/MS-AMP-Examples/tree/main/gpt3) in both Megatron-DeepSpeed and Megatron-LM.
+"O3" is designed for FP8 in ZeRO optimizer, so please make sure ZeRO is enabled when using "O3".
+
+## Usage in Megatron-DeepSpeed and Megatron-LM
+
+For integrating MS-AMP with Megatron-DeepSpeed and Megatron-LM, you need to make some code changes. We provide a patch as a reference for the integration. Here is the instruction of integrating MS-AMP with Megatron-DeepSpeed/Megatron-LM and how to run [gpt-3](https://github.com/Azure/MS-AMP-Examples/tree/main/gpt3) with MS-AMP.
 
 Runnable, simple examples demonstrating good practices can be found [here](https://azure.github.io//MS-AMP/docs/getting-started/run-msamp).
 For more comprehensive examples, please go to [MS-AMP-Examples](https://github.com/Azure/MS-AMP-Examples).