Skip to content

moreh-dev/moai-examples

Repository files navigation

MoAI Platform

A full-stack infrastructure software from PyTorch to GPUs for the LLM era.
Decouple AI infrastructure from specific hardware vendors.
Virtualize of all GPU/NPUs in a cluster for higher utilization and failover.
Scale to thousands of GPUs/NPUs with automatic parallelization and optimization.
Supports any multi-billion or multi-trillion parameter model for training, and serving.


🚀 Designed to unlock the full potential of your AI infrastructure!

overview_01

QuickStart

The moai-examples repository is designed to work with a cluster where the MoAI Platform is installed.
To test these scripts, please contact us.

Recommended Specifications

The optimized versions of MAF, Torch, and Flavor for each model are as follows:

Model MAF Version Torch Version Python Version Flavor Train Batch Eval Batch
Qwen/Qwen-14B 25.1.202 2.1.0 3.10 xLarge.512GB 64 16
Qwen/Qwen-72B 25.1.202 2.1.0 3.10 4xLarge.2048GB 256 8
Qwen/Qwen2-72B-Instruct 25.1.202 2.1.0 3.10 4xLarge.2048GB 32 32
baichuan-inc/Baichuan-13B-Chat 25.1.202 2.1.0 3.10 xLarge.512GB 64 16
internlm/internlm2_5-20b-chat 25.1.202 2.1.0 3.10 2xLarge.1024GB 64 16
meta-llama/Meta-Llama-3-8B 25.1.202 2.1.0 3.10 xLarge.512GB 64 32
meta-llama/Meta-Llama-3-70B-Instuct 25.1.202 2.1.0 3.10 4xLarge.2048GB 256 64
meta-llama/Meta-Llama-3-70B-Instuct (with LoRA) 25.1.202 2.1.0 3.10 xLarge.512GB 16 16
google/gemma-2-27b-it 25.1.202 2.1.0 3.10 2xLarge.1024GB 64 32
THUDM/chatglm3-6b 25.1.202 2.1.0 3.10 xLarge.512GB 64 16
mistralai/Mistral-7B-v0.3 25.1.202 2.1.0 3.10 xLarge.512GB 64 32

Install

python

pip install torch==2.1.0+moreh25.1.202 torchvision==0.16.0 sympy

MoAI Accelerator

You can check the current moai version and flavor through moreh-smi.

moreh-smi

+-----------------------------------------------------------------------------------------------+
|                                          Current Version: 25.1.202  Latest Version: 25.1.202  |
+-----------------------------------------------------------------------------------------------+
|  Device  |          Name          |  Model  |  Memory Usage  |  Total Memory  |  Utilization  |
+===============================================================================================+
|  * 0     |  Ambre AI Accelerator  |  micro  |  -             |  -             |  -            |
+-----------------------------------------------------------------------------------------------+

If they are set differently, please refer to the following links to adjust the torch version and flavor accordingly:

Training

To fine-tune the model, run the training script as follows:

cd moai-examples/finetuning_codes
pip install -r requirments.txt
bash scripts/train_{model}.sh

For training qwen_14b, qwen_72b, additional environment setup is required using the following command:

pip install -r requirements/requirements_qwen.txt

By specifying one of the models listed under example model names in {model}, you can also run other examples.

The scripts are as follows:

#!/bin/bash
# example of train_qwen_14b.sh
START_TIME=$(TZ="Asia/Seoul" date)
current_time=$(date +"%y%m%d_%H%M%S")

TRANSFORMERS_VERBOSITY=info accelerate launch \
    --config_file $CONFIG_PATH \
    train.py \
    --model Qwen/Qwen-14B \
    --dataset alespalla/chatbot_instruction_prompts \
    --lr 0.0001 \
    --train-batch-size 64 \
    --eval-batch-size 16 \
    --num-epochs 5 \
    --max-steps -1 \
    --log-interval 20 \
    --save-path $SAVE_DIR \
    |& tee $LOG_DIR

echo "Start: $START_TIME"
echo "End: $(TZ="Asia/Seoul" date)"

The above script is based on execution from the moai-examples/finetuning_codes directory.
If modifications are required, please adjust it to fit the client or platform specifications.
Additionally, paths such as CONFIG_PATH , SAVE_DIR and LOG_DIR should be updated to match the context of the container in use.

Inference

Please refer to the inference_codes/README.md

Directory and Code Details

Repo Structure

The structure of the entire repository is as follows:

moai-examples
├── README.md                 # Project overview and instructions
├── checkpoints               # Directory to store model checkpoints during finetuning
├── finetuning_codes          # Code related to model fine-tuning
├── git-hooks                 # Git hooks directory for code formatting and other pre/post-commit tasks
├── inference_codes           # Code for running inference with the trained model
└── pretrained_models         # Pretrained weights obtained from Huggingface

finetuning_codes

finetuning_codes directory contains train codes, model configs and scripts necessary for fine-tuning.

finetuning_codes
├── config.yaml                   # Config file for accelerate
├── model                         # Directory containing model-related files
├── requirements                  # Folder for additional dependencies or packages required for fine-tuning
├── scripts                       # Directory containing shell scripts for different fine-tuning setups
├── train.py                      # Main Python script for initiating the fine-tuning process
└── utils.py                      # Utility functions for train.py/train_internlm.py

inference_codes

inference_codes directory contains scripts for model inference.

finetuning_codes
├── agent_client.py            # Python script for model loading
├── benchmark_client.py        # Python script to evaluate inference performance 
├── requirements.txt           # Requirements for inference 
├── chat.py                    # Python script for human evaluation of loaded model
└── client_utils.py            # Utility functions for chat.py/benchmark_client.py/agent_client.py

Learn More

Section Description
Portal Overview of technologies and company
Documentation Detailed explanation of technology and tutorial
ModelHub Chatbot using the MoAI Platform solution

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •