A full-stack infrastructure software from PyTorch to GPUs for the LLM era.
Decouple AI infrastructure from specific hardware vendors.
Virtualize of all GPU/NPUs in a cluster for higher utilization and failover.
Scale to thousands of GPUs/NPUs with automatic parallelization and optimization.
Supports any multi-billion or multi-trillion parameter model for training, and serving.
🚀 Designed to unlock the full potential of your AI infrastructure!
The moai-examples repository is designed to work with a cluster where the MoAI Platform is installed.
To test these scripts, please contact us.
Recommended Specifications
The optimized versions of MAF, Torch, and Flavor for each model are as follows:
Model | MAF Version | Torch Version | Python Version | Flavor | Train Batch | Eval Batch |
---|---|---|---|---|---|---|
Qwen/Qwen-14B | 25.1.202 |
2.1.0 |
3.10 |
xLarge.512GB |
64 | 16 |
Qwen/Qwen-72B | 25.1.202 |
2.1.0 |
3.10 |
4xLarge.2048GB |
256 | 8 |
Qwen/Qwen2-72B-Instruct | 25.1.202 |
2.1.0 |
3.10 |
4xLarge.2048GB |
32 | 32 |
baichuan-inc/Baichuan-13B-Chat | 25.1.202 |
2.1.0 |
3.10 |
xLarge.512GB |
64 | 16 |
internlm/internlm2_5-20b-chat | 25.1.202 |
2.1.0 |
3.10 |
2xLarge.1024GB |
64 | 16 |
meta-llama/Meta-Llama-3-8B | 25.1.202 |
2.1.0 |
3.10 |
xLarge.512GB |
64 | 32 |
meta-llama/Meta-Llama-3-70B-Instuct | 25.1.202 |
2.1.0 |
3.10 |
4xLarge.2048GB |
256 | 64 |
meta-llama/Meta-Llama-3-70B-Instuct (with LoRA) | 25.1.202 |
2.1.0 |
3.10 |
xLarge.512GB |
16 | 16 |
google/gemma-2-27b-it | 25.1.202 |
2.1.0 |
3.10 |
2xLarge.1024GB |
64 | 32 |
THUDM/chatglm3-6b | 25.1.202 |
2.1.0 |
3.10 |
xLarge.512GB |
64 | 16 |
mistralai/Mistral-7B-v0.3 | 25.1.202 |
2.1.0 |
3.10 |
xLarge.512GB |
64 | 32 |
pip install torch==2.1.0+moreh25.1.202 torchvision==0.16.0 sympy
You can check the current moai version and flavor through moreh-smi
.
moreh-smi
+-----------------------------------------------------------------------------------------------+
| Current Version: 25.1.202 Latest Version: 25.1.202 |
+-----------------------------------------------------------------------------------------------+
| Device | Name | Model | Memory Usage | Total Memory | Utilization |
+===============================================================================================+
| * 0 | Ambre AI Accelerator | micro | - | - | - |
+-----------------------------------------------------------------------------------------------+
If they are set differently, please refer to the following links to adjust the torch version and flavor accordingly:
To fine-tune the model, run the training script as follows:
cd moai-examples/finetuning_codes
pip install -r requirments.txt
bash scripts/train_{model}.sh
For training
qwen_14b
,qwen_72b
, additional environment setup is required using the following command:pip install -r requirements/requirements_qwen.txt
By specifying one of the models listed under example model names in {model}
, you can also run other examples.
List of Example Models | Name in {model} |
---|---|
Qwen/Qwen-14B | qwen_14b |
Qwen/Qwen-72B | qwen_72b |
Qwen/Qwen2-72B-Instruct | qwen2_72b |
baichuan-inc/Baichuan-13B-Chat | baichuan |
internlm/internlm2_5-20b-chat | internlm |
meta-llama/Meta-Llama-3-8B | llama_8b |
meta-llama/Meta-Llama-3-70B-Instuct | llama_70b |
meta-llama/Meta-Llama-3-70B-Instuct (with LoRA) | llama_70b_lora |
google/gemma-2-27b-it | gemma |
THUDM/chatglm3-6b | chatglm |
mistralai/Mistral-7B-v0.3 | mistral_7b |
The scripts are as follows:
#!/bin/bash
# example of train_qwen_14b.sh
START_TIME=$(TZ="Asia/Seoul" date)
current_time=$(date +"%y%m%d_%H%M%S")
TRANSFORMERS_VERBOSITY=info accelerate launch \
--config_file $CONFIG_PATH \
train.py \
--model Qwen/Qwen-14B \
--dataset alespalla/chatbot_instruction_prompts \
--lr 0.0001 \
--train-batch-size 64 \
--eval-batch-size 16 \
--num-epochs 5 \
--max-steps -1 \
--log-interval 20 \
--save-path $SAVE_DIR \
|& tee $LOG_DIR
echo "Start: $START_TIME"
echo "End: $(TZ="Asia/Seoul" date)"
The above script is based on execution from the moai-examples/finetuning_codes
directory.
If modifications are required, please adjust it to fit the client or platform specifications.
Additionally, paths such as CONFIG_PATH
, SAVE_DIR
and LOG_DIR
should be updated to match the context of the container in use.
Please refer to the inference_codes/README.md
The structure of the entire repository is as follows:
moai-examples
├── README.md # Project overview and instructions
├── checkpoints # Directory to store model checkpoints during finetuning
├── finetuning_codes # Code related to model fine-tuning
├── git-hooks # Git hooks directory for code formatting and other pre/post-commit tasks
├── inference_codes # Code for running inference with the trained model
└── pretrained_models # Pretrained weights obtained from Huggingface
finetuning_codes
directory contains train codes, model configs and scripts necessary for fine-tuning.
finetuning_codes
├── config.yaml # Config file for accelerate
├── model # Directory containing model-related files
├── requirements # Folder for additional dependencies or packages required for fine-tuning
├── scripts # Directory containing shell scripts for different fine-tuning setups
├── train.py # Main Python script for initiating the fine-tuning process
└── utils.py # Utility functions for train.py/train_internlm.py
inference_codes
directory contains scripts for model inference.
finetuning_codes
├── agent_client.py # Python script for model loading
├── benchmark_client.py # Python script to evaluate inference performance
├── requirements.txt # Requirements for inference
├── chat.py # Python script for human evaluation of loaded model
└── client_utils.py # Utility functions for chat.py/benchmark_client.py/agent_client.py
Section | Description |
---|---|
Portal | Overview of technologies and company |
Documentation | Detailed explanation of technology and tutorial |
ModelHub | Chatbot using the MoAI Platform solution |