diff --git a/README.md b/README.md
index 735e8cb1..798083e7 100755
--- a/README.md
+++ b/README.md
@@ -19,7 +19,7 @@ FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensibl
 
     FlagAI provides an API that allows you to quickly download pre-trained models and fine-tune them on a wide range of datasets collected from [SuperGLUE](https://super.gluebenchmark.com/) and [CLUE](https://github.com/CLUEbenchmark/CLUE) benchmarks for both Chinese and English text.
 
-    FlagAI now supports over 30 mainstream models, including multilingual text and image representation model [**AltCLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP), text-to-image generation model [**AltDiffusion**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion) [![Huggingface space](https://img.shields.io/badge/🤗-Huggingface%20Space-cyan.svg)](https://huggingface.co/spaces/BAAI/bilingual_stable_diffusion), [**WuDao GLM**](/docs/GLM.md) (with a maximum of 10 billion parameters), [**EVA-CLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/EVA_CLIP), **OPT**, **BERT**, **RoBERTa**, **GPT2**, **T5**, **ALM**, and models from **Huggingface Transformers**, etc.
+    FlagAI now supports over 30 mainstream models, including Language Model [**Aquila**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/Aquila), multilingual text and image representation model [**AltCLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP), text-to-image generation model [**AltDiffusion**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion) [![Huggingface space](https://img.shields.io/badge/🤗-Huggingface%20Space-cyan.svg)](https://huggingface.co/spaces/BAAI/bilingual_stable_diffusion), [**WuDao GLM**](/docs/GLM.md) (with a maximum of 10 billion parameters), [**EVA-CLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/EVA_CLIP), **OPT**, **BERT**, **RoBERTa**, **GPT2**, **T5**, **ALM**, and models from **Huggingface Transformers**, etc.
     
 
 2. **Parallel train with fewer than 10 lines of code**
@@ -56,6 +56,7 @@ FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensibl
 
 |   Model          |  Task    | Train | Finetune | Inference/Generate | Examples       |                                                         
 | :---------------- | :------- | :-- |:-- | :-- | :--------------------------------------------- |
+| Aquila      | Natural Language Processing  | ✅  | ✅  | ✅  | [README.md](examples/Aquila/README.md) 
 | ALM          | Arabic Text Generation  |  ✅  | ❌  | ✅  | [README.md](/examples/ALM/README.md)  |                         
 | AltCLIP       | Image-Text Matching  | ✅  | ✅  | ✅  | [README.md](/examples/AltCLIP/README.md)   |  
 | AltCLIP-m18      | Image-Text Matching  | ✅  | ✅  | ✅  | [README.md](examples/AltCLIP-m18/README.md)   |                             
diff --git a/README_zh.md b/README_zh.md
index 3800c70b..11256995 100755
--- a/README_zh.md
+++ b/README_zh.md
@@ -26,7 +26,7 @@
       
     提供 API 方便你快速下载模型，并在给定（中/英文）文本上使用这些预训练模型，在从[SuperGLUE](https://super.gluebenchmark.com/)和[CLUE](https://github.com/CLUEbenchmark/CLUE) benchmarks收集的广泛使用的数据集上对它们进行微调。
      
-      FlagAI 现已支持 30+ 主流模型，包括多模态模型 [**AltCLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP) 、文生图模型 [**AltDiffusion**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion) [![Huggingface space](https://img.shields.io/badge/🤗-Huggingface%20Space-cyan.svg)](https://huggingface.co/spaces/BAAI/bilingual_stable_diffusion)、最高百亿参数的 **[悟道GLM](/doc_zh/GLM.md)**，[**EVA-CLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/EVA_CLIP)、**[Galactica](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/galactica)**、**OPT**、**BERT**、**RoBERTa**、**GPT2**、**T5**、**ALM**、**Huggingface Transformers** 等。
+      FlagAI 现已支持 30+ 主流模型，包括语言模型[**Aquila**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/Aquila), 多模态模型 [**AltCLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP) 、文生图模型 [**AltDiffusion**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion) [![Huggingface space](https://img.shields.io/badge/🤗-Huggingface%20Space-cyan.svg)](https://huggingface.co/spaces/BAAI/bilingual_stable_diffusion)、最高百亿参数的 **[悟道GLM](/doc_zh/GLM.md)**，[**EVA-CLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/EVA_CLIP)、**[Galactica](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/galactica)**、**OPT**、**BERT**、**RoBERTa**、**GPT2**、**T5**、**ALM**、**Huggingface Transformers** 等。
       
 2.  **仅用十行代码即可进行并行训练**
 
@@ -59,6 +59,7 @@
 
 |    模型名称            | 任务      | 训练 | 微调 | 推理 | 样例           |                                                         
 | :---------------- | :------- | :-- |:-- | :-- | :--------------------------------------------- |
+| Aquila      | 自然语言处理  | ✅  | ✅  | ✅  | [README.md](examples/Aquila/README.md) 
 | ALM          | 阿拉伯语文本生成   |  ✅  | ❌  | ✅  | [README.md](/examples/ALM/README.md)  |                         
 | AltCLIP       | 文图匹配 | ✅  | ✅  | ✅  | [README.md](/examples/AltCLIP/README.md)   |  
 | AltCLIP-m18      | 文图匹配  | ✅  | ✅  | ✅  | [README.md](examples/AltCLIP-m18/README.md)   |                             
diff --git a/examples/Aquila/Aquila-sft/Aquila-sft.yaml b/examples/Aquila/Aquila-chat/Aquila-chat.yaml
similarity index 100%
rename from examples/Aquila/Aquila-sft/Aquila-sft.yaml
rename to examples/Aquila/Aquila-chat/Aquila-chat.yaml
diff --git a/examples/Aquila/Aquila-sft/README_AquilaChat.md b/examples/Aquila/Aquila-chat/README.md
similarity index 100%
rename from examples/Aquila/Aquila-sft/README_AquilaChat.md
rename to examples/Aquila/Aquila-chat/README.md
diff --git a/examples/Aquila/Aquila-sft/aquila_sft.py b/examples/Aquila/Aquila-chat/aquila_chat.py
similarity index 99%
rename from examples/Aquila/Aquila-sft/aquila_sft.py
rename to examples/Aquila/Aquila-chat/aquila_chat.py
index 28416509..e926fe8e 100755
--- a/examples/Aquila/Aquila-sft/aquila_sft.py
+++ b/examples/Aquila/Aquila-chat/aquila_chat.py
@@ -13,7 +13,7 @@
 from flagai.env_trainer_v1 import EnvTrainer
 import jsonlines
 import numpy as np
-from examples.Aquila import cyg_conversation as conversation_lib
+import cyg_conversation as conversation_lib
 from flagai.model.tools.lora.prepare_lora import lora_transfer
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
diff --git a/examples/Aquila/Aquila-sft/bmtrain_mgpu.sh b/examples/Aquila/Aquila-chat/bmtrain_mgpu.sh
similarity index 100%
rename from examples/Aquila/Aquila-sft/bmtrain_mgpu.sh
rename to examples/Aquila/Aquila-chat/bmtrain_mgpu.sh
diff --git a/examples/Aquila/cyg_conversation.py b/examples/Aquila/Aquila-chat/cyg_conversation.py
similarity index 100%
rename from examples/Aquila/cyg_conversation.py
rename to examples/Aquila/Aquila-chat/cyg_conversation.py
diff --git a/examples/Aquila/Aquila-sft/data/sft_samples.jsonl b/examples/Aquila/Aquila-chat/data/sft_samples.jsonl
similarity index 100%
rename from examples/Aquila/Aquila-sft/data/sft_samples.jsonl
rename to examples/Aquila/Aquila-chat/data/sft_samples.jsonl
diff --git a/examples/Aquila/Aquila-sft/dist_trigger_docker.sh b/examples/Aquila/Aquila-chat/dist_trigger_docker.sh
similarity index 100%
rename from examples/Aquila/Aquila-sft/dist_trigger_docker.sh
rename to examples/Aquila/Aquila-chat/dist_trigger_docker.sh
diff --git a/examples/Aquila/Aquila-sft/generate_sft.py b/examples/Aquila/Aquila-chat/generate_chat.py
similarity index 97%
rename from examples/Aquila/Aquila-sft/generate_sft.py
rename to examples/Aquila/Aquila-chat/generate_chat.py
index a35b4110..8f243968 100755
--- a/examples/Aquila/Aquila-sft/generate_sft.py
+++ b/examples/Aquila/Aquila-chat/generate_chat.py
@@ -92,7 +92,7 @@ def convo_tokenize(convo_obj, tokenizer):
     print('-'*80)
     print(f"text is {text}")
 
-    from examples.Aquila.cyg_conversation import default_conversation
+    from cyg_conversation import default_conversation
 
     conv = default_conversation.copy()
     conv.append_message(conv.roles[0], text)
diff --git a/examples/Aquila/Aquila-chat/generate_chat_bminf.py b/examples/Aquila/Aquila-chat/generate_chat_bminf.py
new file mode 100755
index 00000000..7681d526
--- /dev/null
+++ b/examples/Aquila/Aquila-chat/generate_chat_bminf.py
@@ -0,0 +1,109 @@
+import os
+import torch
+from flagai.auto_model.auto_loader import AutoLoader
+from flagai.model.predictor.predictor import Predictor
+from flagai.model.predictor.aquila import aquila_generate
+from flagai.data.tokenizer import Tokenizer
+import bminf
+
+state_dict = "/data2/yzd/checkpoints/converted_models_ldwang"
+model_name = 'aquilachat-7b'
+
+loader = AutoLoader(
+    "lm",
+    model_dir=state_dict,
+    model_name=model_name,
+    use_cache=True)
+model = loader.get_model()
+tokenizer = loader.get_tokenizer()
+cache_dir = os.path.join(state_dict, model_name)
+
+model.eval()
+model.half()
+
+with torch.cuda.device(0):
+    model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30)
+
+predictor = Predictor(model, tokenizer)
+
+texts = [
+        "北京为什么是中国的首都？",
+        "1+1=",
+        "为什么湘菜那么甜？",
+        "东三省和海南岛的区别？",
+        ]
+## 
+def pack_obj(text):
+    obj = dict()
+    obj['id'] = 'demo'
+
+    obj['conversations'] = []
+    human = dict()
+    human['from'] = 'human'
+    human['value'] = text
+    obj['conversations'].append(human)
+    # dummy bot
+    bot = dict()
+    bot['from'] = 'gpt'
+    bot['value'] = ''
+    obj['conversations'].append(bot)
+
+    obj['instruction'] = ''
+
+    return obj
+
+def delete_last_bot_end_singal(convo_obj):
+    conversations = convo_obj['conversations']
+    assert len(conversations) > 0 and len(conversations) % 2 == 0
+    assert conversations[0]['from'] == 'human'
+
+    last_bot = conversations[len(conversations)-1]
+    assert last_bot['from'] == 'gpt'
+
+    ## from _add_speaker_and_signal
+    END_SIGNAL = "\n"
+    len_end_singal = len(END_SIGNAL)
+    len_last_bot_value = len(last_bot['value'])
+    last_bot['value'] = last_bot['value'][:len_last_bot_value-len_end_singal]
+    return
+
+def convo_tokenize(convo_obj, tokenizer):
+    chat_desc = convo_obj['chat_desc']
+    instruction = convo_obj['instruction']
+    conversations = convo_obj['conversations']
+            
+    # chat_desc
+    example = tokenizer.encode_plus(f"{chat_desc}", None, max_length=None)['input_ids']
+    EOS_TOKEN = example[-1]
+    example = example[:-1] # remove eos
+    # instruction
+    instruction = tokenizer.encode_plus(f"{instruction}", None, max_length=None)['input_ids']
+    instruction = instruction[1:-1] # remove bos & eos
+    example += instruction
+
+    for conversation in conversations:
+        role = conversation['from']
+        content = conversation['value']
+        print(f"role {role}, raw content {content}")
+        content = tokenizer.encode_plus(f"{content}", None, max_length=None)['input_ids']
+        content = content[1:-1] # remove bos & eos
+        print(f"role {role}, content {content}")
+        example += content
+    return example
+
+for text in texts:
+    print('-'*80)
+    print(f"text is {text}")
+
+    from cyg_conversation import default_conversation
+
+    conv = default_conversation.copy()
+    conv.append_message(conv.roles[0], text)
+    conv.append_message(conv.roles[1], None)
+
+    tokens = tokenizer.encode_plus(f"{conv.get_prompt()}", None, max_length=None)['input_ids']
+    tokens = tokens[1:-1]
+
+    with torch.no_grad():
+        out = aquila_generate(tokenizer, model, [text], max_gen_len:=200, top_p=0.95, prompts_tokens=[tokens])
+        print(f"pred is {out}")
\ No newline at end of file
diff --git a/examples/Aquila/Aquila-sft/hostfile b/examples/Aquila/Aquila-chat/hostfile
similarity index 100%
rename from examples/Aquila/Aquila-sft/hostfile
rename to examples/Aquila/Aquila-chat/hostfile
diff --git a/examples/Aquila/Aquila-code/README_AquilaCode.md b/examples/Aquila/Aquila-code/README.md
similarity index 94%
rename from examples/Aquila/Aquila-code/README_AquilaCode.md
rename to examples/Aquila/Aquila-code/README.md
index 4539dbe6..8a06352e 100755
--- a/examples/Aquila/Aquila-code/README_AquilaCode.md
+++ b/examples/Aquila/Aquila-code/README.md
@@ -148,7 +148,7 @@ bash dist_trigger_docker.sh hostfile Aquila-sft.yaml [aquilacode-7b-nv/aquilacod
 
 ## 证书/License
 
-AquilaCode-7B-NV开源模型使用 [智源Aquila系列模型许可协议](https://huggingface.co/BAAI/AquilaCode-7B-NV/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf), 原始代码基于[Apache Licence 2.0](https://www.apache.org/licenses/LICENSE-2.0)。
+AquilaCode-7B-NV和AquilaCode-7B-TS开源模型使用 [智源Aquila系列模型许可协议](https://huggingface.co/BAAI/AquilaCode-7B-NV/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf), 原始代码基于[Apache Licence 2.0](https://www.apache.org/licenses/LICENSE-2.0)。
 
 
-AquilaCode-7B-NV open-source model is licensed under [ BAAI Aquila Model Licence Agreement](https://huggingface.co/BAAI/AquilaCode-7B-NV/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf). The source code is under [Apache Licence 2.0](https://www.apache.org/licenses/LICENSE-2.0).
+AquilaCode-7B-NV and AquilaCode-7B-TSopen-source model is licensed under [ BAAI Aquila Model Licence Agreement](https://huggingface.co/BAAI/AquilaCode-7B-NV/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf). The source code is under [Apache Licence 2.0](https://www.apache.org/licenses/LICENSE-2.0).
diff --git a/examples/Aquila/Aquila-code/aquila_code.py b/examples/Aquila/Aquila-code/aquila_code.py
deleted file mode 100755
index 39c7c347..00000000
--- a/examples/Aquila/Aquila-code/aquila_code.py
+++ /dev/null
@@ -1,223 +0,0 @@
-# Copyright © 2022 BAAI. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License")
-import os
-import torch
-from torch.utils.data import Dataset
-import gc
-gc.collect()
-torch.cuda.empty_cache()
-from flagai.auto_model.auto_loader import AutoLoader
-from flagai.data.tokenizer import Tokenizer
-from flagai.env_args import EnvArgs
-from flagai.env_trainer_v1 import EnvTrainer
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-# You can input all parameters by the command line.
-# For example: python train_env_trainer.py --epochs=300 --batch_size=4 --env_type=pytorch
-env_args = EnvArgs(
-    env_type="bmtrain",
-    experiment_name="aquila",
-    batch_size=1,
-    gradient_accumulation_steps=1,
-    lr=2e-4,
-    weight_decay=1e-3,
-    epochs=100,
-    log_interval=10,
-    eval_interval=5000,
-    num_gpus=1,
-    load_dir=None,
-    pytorch_device=device,
-    save_dir="checkpoints_aquila",
-    checkpoint_activations=False,
-    save_interval=5000,
-    fp16=True,
-    training_script=__file__,
-)
-env_args = env_args.parse_args()
-#env_args.wandb = False
-
-# overwrite
-if env_args.yaml_config:
-    import yaml
-    file_data = open(env_args.yaml_config, 'r', encoding="utf-8").read()
-    data = yaml.load_all(file_data)
-    delattr(env_args, 'yaml_config')
-    arg_dict = env_args.__dict__
-    for subdata in data:
-        for key, value in subdata.items():
-            if isinstance(value, list):
-                for v in value:
-                    arg_dict[key].append(v)
-            else:
-                arg_dict[key] = value
-trainer = EnvTrainer(env_args)
-
-# Trainer as Trigger
-if not env_args.not_call_launch:
-    import sys
-    sys.exit(0)
-
-print(f"Trainer effective env_args={env_args} local_rank={trainer.local_rank}", flush=True)
-
-checkpoints = env_args.pre_load_dir
-
-model_name = env_args.model_name
-
-env_args.enable_sft_conversations_dataset_v3 = True
-
-
-print('*'*20, "model_name", model_name, flush=True)
-
-'''
-auto_loader = AutoLoader(
-    "lm",
-    model_name=model_name,
-    model_dir=checkpoints,
-    only_download_config=True,
-)
-model = auto_loader.get_model()
-tokenizer = auto_loader.get_tokenizer()
-print('*'*20, "model", model)
-trainer.pre_train(model)
-print('*'*20, "model", model)
-
-'''
-
-cache_dir = os.path.join(checkpoints, model_name)
-print('*'*20, "cache_dir", cache_dir)
-tokenizer = Tokenizer.from_pretrained(model_name, cache_dir=cache_dir)
-print('*'*20, "tokenizer", tokenizer)
-
-# avoid sync loading models in case of Mem OOM
-if env_args.bmt_async_load:
-    import time
-    time.sleep(10*60*(trainer.local_rank%4))
-
-
-config_file = os.path.join(cache_dir, 'config.json')
-from flagai.model.aquila_model import AQUILAModel
-model = AQUILAModel.init_from_json(config_file=config_file)
-print('*'*20, "model", model)
-
-## bmt_pre_load
-checkpoint_path = os.path.join(cache_dir, "pytorch_model.bin")
-if env_args.bmt_pre_load:
-    model.load_weights(checkpoint_path)
-
-trainer.pre_train(model)
-
-print('*'*20, "model", model, flush=True)
-
-assert env_args.enable_sft_dataset_dir is not None and \
-        env_args.enable_sft_dataset_file is not None
-
-cur_dir = env_args.enable_sft_dataset_dir
-jsonl_data = os.path.join(cur_dir, env_args.enable_sft_dataset_file)
-max_seq_len = 2048
-
-import jsonlines
-import numpy as np
-def read_file():
-    conversations = []
-    with jsonlines.open(jsonl_data) as reader:
-        for line in reader:
-            if 'chat_desc' not in line or 'instruction' not in line or 'conversations' not in line:
-                continue
-            obj = dict()
-            obj['chat_desc'] = line['chat_desc']
-            obj['conversations'] = line['conversations']
-            obj['instruction'] = line['instruction']
-            conversations.append(obj)
-    return conversations
-
-class ConversationDataset(Dataset):
-    def __init__(self, conversations, tokenizer, maxlen=512):
-        super(ConversationDataset, self).__init__()
-        self.conversations = conversations
-        self.tokenizer = tokenizer
-        self.maxlen = maxlen
-
-    def __getitem__(self, i):
-        chat_desc = self.conversations[i]['chat_desc']
-        instruction = self.conversations[i]['instruction']
-        conversations = self.conversations[i]['conversations']
-        
-        # chat_desc
-        example = self.tokenizer.encode_plus(f"{chat_desc}", None, max_length=None)['input_ids']
-        EOS_TOKEN = example[-1]
-        example = example[:-1] # remove eos
-        # instruction
-        instruction = self.tokenizer.encode_plus(f"{instruction}", None, max_length=None)['input_ids']
-        instruction = instruction[1:-1] # remove bos & eos
-        example += instruction
-
-        import copy
-        labels = copy.deepcopy(example)
-
-        for conversation in conversations:
-            role = conversation['from']
-            content = conversation['value']
-            content = self.tokenizer.encode_plus(f"{content}", None, max_length=None)['input_ids']
-            content = content[1:-1] # remove bos & eos
-            example += content
-            if role == 'gpt':
-                role_labels = copy.deepcopy(content)
-            else:
-                # masking
-                role_labels = [env_args.IGNORE_INDEX] * len(content)
-            labels += role_labels
-
-        example.append(EOS_TOKEN)
-        labels.append(EOS_TOKEN)
-
-        ## maxlen
-        example = example[:self.maxlen]
-        labels = labels[:self.maxlen]
-
-        output = {
-            "input_ids": example,
-            "labels": labels,
-        }
-        return output
-
-    def __len__(self):
-        return len(self.conversations)
-
-    @staticmethod
-    def collate_fn(batch):
-        def padding(indice, max_length, pad_idx=0):
-            pad_indice = [
-                item + [pad_idx] * max(0, max_length - len(item)) for item in indice
-            ]
-            return torch.tensor(pad_indice)
-
-        input_ids = [data["input_ids"] for data in batch]
-        labels = [data["labels"] for data in batch]
-        max_length = max_seq_len
-        input_ids = padding(input_ids, max_length)[:,:max_length]
-        labels = padding(labels, max_length, pad_idx=env_args.IGNORE_INDEX)[:,:max_length]
-
-        data = {
-            "input_ids": input_ids,
-            "labels": labels
-        }
-        return data
-
-conversations = read_file()
-data_len = len(conversations)
-#train_size = int(data_len * 0.95)
-train_size = data_len
-train_conversations = conversations[:train_size]
-
-train_dataset = ConversationDataset(train_conversations,
-                                    tokenizer=tokenizer,
-                                    maxlen=max_seq_len)
-
-trainer.do_train(
-    train_dataset=train_dataset,
-    valid_dataset=None,
-    collate_fn=ConversationDataset.collate_fn,
-    optimizer=None,
-    rank_split=False)
\ No newline at end of file
diff --git a/examples/Aquila/Aquila-code/aquila_code_pretrain.py b/examples/Aquila/Aquila-code/aquila_code_pretrain.py
new file mode 100755
index 00000000..043688ca
--- /dev/null
+++ b/examples/Aquila/Aquila-code/aquila_code_pretrain.py
@@ -0,0 +1,130 @@
+# Copyright © 2022 BAAI. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License")
+import os
+import torch
+from torch.utils.data import Dataset
+import gc
+gc.collect()
+torch.cuda.empty_cache()
+from flagai.auto_model.auto_loader import AutoLoader
+from flagai.data.tokenizer import Tokenizer
+from flagai.env_args import EnvArgs
+from flagai.env_trainer_v1 import EnvTrainer
+from flagai.model.aquila_model import AQUILAModel
+from flagai.data.datasets.indexed_dataset.build_index_mappings import _build_train_valid_test_datasets,_build_train_valid_test_weighted_datasets
+
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# You can input all parameters by the command line.
+# For example: python train_env_trainer.py --epochs=300 --batch_size=4 --env_type=pytorch
+env_args = EnvArgs(
+    env_type="bmtrain",
+    experiment_name="aquila",
+    batch_size=1,
+    gradient_accumulation_steps=1,
+    lr=2e-4,
+    weight_decay=1e-3,
+    epochs=100,
+    log_interval=10,
+    eval_interval=5000,
+    num_gpus=1,
+    load_dir=None,
+    pytorch_device=device,
+    save_dir="checkpoints_aquila",
+    checkpoint_activations=False,
+    save_interval=5000,
+    fp16=True,
+    training_script=__file__,
+)
+env_args = env_args.parse_args()
+#env_args.wandb = False
+
+# overwrite
+if env_args.yaml_config:
+    import yaml
+    file_data = open(env_args.yaml_config, 'r', encoding="utf-8").read()
+    data = yaml.load_all(file_data)
+    delattr(env_args, 'yaml_config')
+    arg_dict = env_args.__dict__
+    for subdata in data:
+        for key, value in subdata.items():
+            if isinstance(value, list):
+                for v in value:
+                    arg_dict[key].append(v)
+            else:
+                arg_dict[key] = value
+trainer = EnvTrainer(env_args)
+
+# Trainer as Trigger
+if not env_args.not_call_launch:
+    import sys
+    sys.exit(0)
+
+print(f"Trainer effective env_args={env_args} local_rank={trainer.local_rank}", flush=True)
+checkpoints = env_args.pre_load_dir
+model_name = env_args.model_name
+
+print('*'*20, "model_name", model_name, flush=True)
+
+cache_dir = os.path.join(checkpoints, model_name)
+print('*'*20, "cache_dir", cache_dir)
+tokenizer = Tokenizer.from_pretrained(model_name, cache_dir=cache_dir)
+print('*'*20, "tokenizer", tokenizer)
+
+# avoid sync loading models in case of Mem OOM
+if env_args.bmt_async_load:
+    import time
+    time.sleep(10*60*(trainer.local_rank%4))
+
+config_file = os.path.join(cache_dir, 'config.json')
+model = AQUILAModel.init_from_json(config_file=config_file)
+
+## bmt_pre_load
+checkpoint_path = os.path.join(cache_dir, "pytorch_model.bin")
+if env_args.bmt_pre_load:
+    model.load_weights(checkpoint_path)
+
+trainer.pre_train(model)
+
+print('*'*20, "model", model, flush=True)
+
+## Use Prebuilt DataSets
+data_prefix = '../../indexed_dataset/data/demo_text_document'
+data_impl = 'mmap'
+splits_string = '90,10'
+train_valid_test_num_samples = [90, 10]
+seq_length = 1024
+seed = 2023
+skip_warmup = True
+
+train_dataset, valid_dataset, _ = _build_train_valid_test_datasets(
+    data_prefix, data_impl, splits_string,
+    train_valid_test_num_samples,
+    seq_length, seed, skip_warmup)
+print("Total train_dataset: ", len(train_dataset), flush=True)
+print("Total valid_dataset: ", len(valid_dataset), flush=True)
+
+def collate_fn(batch):
+    def padding(indice, max_length, pad_idx=tokenizer.token_end_id):
+        pad_indice = [
+            item.tolist() + [pad_idx] * max(0, max_length - len(item.tolist())) for item in indice
+        ]
+        return torch.tensor(pad_indice)
+
+    input_ids = [data["input_ids"] for data in batch]
+    max_length = max([len(t) for t in input_ids])
+    input_ids = padding(input_ids, max_length)[:,:seq_length]
+
+    data = {
+        "input_ids": input_ids,
+        "labels": input_ids
+    }
+    return data
+
+trainer.do_train(
+    train_dataset=train_dataset,
+    valid_dataset=None,
+    collate_fn=collate_fn,
+    optimizer=None,
+    rank_split=False)
diff --git a/examples/Aquila/Aquila-pretrain-33B.yaml b/examples/Aquila/Aquila-pretrain-33B.yaml
new file mode 100755
index 00000000..ca3e3c59
--- /dev/null
+++ b/examples/Aquila/Aquila-pretrain-33B.yaml
@@ -0,0 +1,10 @@
+batch_size: 10
+gradient_accumulation_steps: 1
+lr: 1.5e-4
+warm_up: 0.01
+save_interval: 1000
+log_interval: 10
+bmt_loss_scale: 131072
+save_optim: True
+save_rng: True
+eps: 1.e-8
\ No newline at end of file
diff --git a/examples/Aquila/Aquila-pretrain.yaml b/examples/Aquila/Aquila-pretrain.yaml
new file mode 100755
index 00000000..49ee411b
--- /dev/null
+++ b/examples/Aquila/Aquila-pretrain.yaml
@@ -0,0 +1,10 @@
+batch_size: 10
+gradient_accumulation_steps: 1
+lr: 3.0e-4
+warm_up: 0.01
+save_interval: 1000
+log_interval: 10
+bmt_loss_scale: 131072
+save_optim: True
+save_rng: True
+eps: 1.e-8
\ No newline at end of file
diff --git a/examples/Aquila/Aquila-pretrain/README_Aquila.md b/examples/Aquila/Aquila-pretrain/README.md
similarity index 99%
rename from examples/Aquila/Aquila-pretrain/README_Aquila.md
rename to examples/Aquila/Aquila-pretrain/README.md
index 1468dc93..ec6d2217 100755
--- a/examples/Aquila/Aquila-pretrain/README_Aquila.md
+++ b/examples/Aquila/Aquila-pretrain/README.md
@@ -161,4 +161,4 @@ with torch.no_grad():
 Aquila-7B和Aquila-33B开源模型使用 [智源Aquila系列模型许可协议](https://huggingface.co/BAAI/AquilaCode-7B-NV/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf), 原始代码基于[Apache Licence 2.0](https://www.apache.org/licenses/LICENSE-2.0)。
 
 
-Aquila-7B and Aquila-33B open-source model is licensed under [ BAAI Aquila Model Licence Agreement](https://huggingface.co/BAAI/AquilaCode-7B-NV/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf). The source code is under [Apache Licence 2.0](https://www.apache.org/licenses/LICENSE-2.0).
+Aquila-7B and Aquila-33B open-source model is licensed under [ BAAI Aquila Model Licence Agreement](https://huggingface.co/BAAI/AquilaCode-7B-NV/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf). The source code is under [Apache Licence 2.0](https://www.apache.org/licenses/LICENSE-2.0)
diff --git a/examples/Aquila/README.md b/examples/Aquila/README.md
new file mode 100755
index 00000000..ec6d2217
--- /dev/null
+++ b/examples/Aquila/README.md
@@ -0,0 +1,164 @@
+license: [Apache License 2.0](https://model.baai.ac.cn/use-agreement)
+
+
+# Aquila
+
+## 简介/Overview
+Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点，替换了一批更高效的底层算子实现、重新设计实现了中英双语的tokenizer，升级了BMTrain并行训练方法，在Aquila的训练过程中实现了比Magtron+DeepSpeed zero-2将近８倍的训练效率。Aquila语言大模型是在中英文高质量语料基础上从０开始训练的，通过数据质量的控制、多种训练的优化方法，实现在更小的数据集、更短的训练时间，获得比其它开源模型更优的性能。也是首个支持中英双语知识、支持商用许可协议、符合国内数据合规需要的大规模开源语言模型。
+
+The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
+
+
+我们同时也支持[Huggingface平台](https://huggingface.co/BAAI)。
+
+We also support [Huggingface](https://huggingface.co/BAAI).
+
+## 模型细节/Model details
+
+|   模型/Model          |  状态/State    | 能否商用/Commercial use?  |  所用显卡/GPU   |                                    
+| :---------------- | :------- | :-- |:-- |   
+| <font color=red>Aquila-7B </font>         | 已发布  |   ✅   | Nvidia-A100  | 
+| <font color=red>Aquila-30B </font>         | 敬请期待  |   ✅   | Nvidia-A100  | 
+| AquilaCode-7B-NV          |已发布  |    ✅   |   Nvidia-A100   | 
+| AquilaCode-7B-TS           |已发布 |   ✅    |  Tianshu-BI-V100   |
+| AquilaChat-7B           |已发布  |    ✅    | Nvidia-A100  | 
+
+我们使用了一系列更高效的底层算子来辅助模型训练，其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算，同时还使用了RMSNorm。在此基础上，我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练，该技术采用了数据并行、ZeRO（零冗余优化器）、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
+
+Aquila模型所采用的tokenizer是由我们从头开始训练的，支持中英双语。我们在处理英文、中文以及代码数据时，采用了不同的分词器对一万个样本进行了抽取。随后，我们统计了每个样本的token数量，并将其记录在表格中。Aquila tokenizer与其他tokenizer的参数对比见下表:
+
+We used a series of more efficient low-level operators to assist with model training, including methods referenced from [flash-attention](https://github.com/HazyResearch/flash-attention) and replacing some intermediate calculations, as well as using RMSNorm. Building upon this foundation, we applied the [BMtrain](https://github.com/OpenBMB/BMTrain) for lightweight parallel training, which utilizes methods such as data parallelism, ZeRO (zero redundancy optimizer), optimizer offloading, checkpoint and operation fusion, and communication-computation overlap to optimize the model training process.
+
+The tokenizer used in the Aquila model was trained from scratch by us and supports both English and Chinese. We used different tokenizers to extract ten thousand data samples from English, Chinese, and code data respectively, obtained the count of tokens for each sample, and also included it in the table. The parameters of this tokenizer are compared to those of other tokenizers in the table below:
+
+
+
+| 模型/Model | 词表大小/Vocab size | 说明/Note |英文平均tokens量/Avg tokens(English)| 中文平均tokens量/Avg tokens(Chinesse)|代码平均tokens量/Avg tokens(code)  |
+|  -----  | ----  | -----  | ----  | -----  | ----  | 
+| GPT2 | 50527 | bpe|1717 | 1764|2323 |
+| LLaMA | 32000 | sp(bpe)|1805| 1257|1970 |
+| Aquila | 100000 | bpe|1575 | 477|1679 |
+
+
+
+## 训练数据集/Training data 
+Aquila预训练使用了Pile，[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), 悟道中文数据集、电子书、专利、百科、论坛, github数据等, 详情可见下图。
+
+The Aquila-7B model was pretrained on Pile，[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), Wudao Corpus、e-book、Patent, encyclopedia, forum, github etc. Details are given in the figure below.
+
+![Screenshot](../img/data_dist.png)
+
+
+
+## 使用方式/How to use
+
+### 1. 预训练/Pre-training
+#### Step 1: 修改参数/Modify Parameters
+
+* `cd /examples/aquila`
+* 配置`hostfile`文件, 参考[这里](../../../doc_zh/TUTORIAL_8_ENVIRONMENT_SETUP.md#a配置hostfilehostfile-中的v100-1-与sshconfig-对应) ; Configure the `hostfile` file, refer to [here](../../../docs/TUTORIAL_8_ENVIRONMENT_SETUP.md)
+* 配置`bmtrain_mgpu.sh`文件, 将`SCRIPT_FILE`改成`aquila_pretrain.py`; configure the `bmtrain_mgpu.sh` file, change `SCRIPT_FILE` to `aquila_pretrain.py`
+* (可选) 在`Aquila-pretrain.yaml`文件里更改参数 ; (optional) change parameters in `Aquila-pretrain.yaml`
+
+| 参数名 Parameter             | 类型 Type | 描述 Description                                        |
+|--------------------------------|------------|-------------------------------------------------------|
+| batch_size | int   | 每次迭代训练时，从数据集中抽取的样本数。一般来说，它越大，处理速度越快，但会占用更多的内存; The number of samples extracted from the dataset for each iteration during training. Generally, a larger batch size can speed up processing but may also consume more memory                    |
+| gradient_accumulation_steps | int   | 在更新模型权重之前，要对多个小批次进行梯度计算的次数。主要应用于GPU显存较小的情况下，可以使用小的batch_size，通过梯度累积达到与大batch_size相同的效果; The number of samples extracted from the dataset for each iteration during training. Generally, a larger batch size can speed up processing but may also consume more memoryimages                  |
+| lr | float   | 指控制模型更新参数时的步长或速率。学习率过高可能导致模型不收敛，而学习率过低则可能导致训练时间过长或者陷入局部最优解; The step size or rate at which the model updates its parameters during training. A high learning rate may cause the model not to converge, while a low learning rate may result in long training times or being stuck in a local optimum                  |
+| warm_up | float   | 初始学习率与原始学习率的比例; The ratio between the initial learning rate and the original learning rate
+| save_interval | int  | 模型保存的间隔，即每训练多少个iteration保存一次模型。当训练时间较长时，保存间隔可以避免因突然中断或出现错误导致训练成果全部丢失; The interval at which the model is saved, i.e., how many iterations the model is saved during training. When training takes a long time, saving intervals can prevent all training achievements from being lost due to sudden interruptions or errors.                    |
+
+* 我们的演示数据集放在`../indexed_dataset/data/demo_text_document`里。 如果想修改预训练数据集，可更改`aquila_pretrain.py`里的`data_prefix`参数; Our demo dataset is located in `../indexed_dataset/data/demo_text_document`. If you want to modify the pre-training dataset, you can change the data_prefix parameter in `aquila_pretrain.py`.
+#### Step 2: 启动训练/Start training
+对于Aquila-7B模型
+```
+bash dist_trigger_docker.sh hostfile Aquila-pretrain.yaml aquila-7b [实验名]
+```   
+
+对于Aquila-33B模型
+```
+bash dist_trigger_docker.sh hostfile Aquila-pretrain-33B.yaml aquila-33b [实验名]
+```   
+
+ 接下来会输出下列信息，注意`NODES_NUM`应该与节点数相等，`LOGFILE`是模型运行的日志文件；The following information will be output. Note that `NODES_NUM` should be equal to the number of nodes, and `LOGFILE` is the log file for the model run.
+
+![Screenshot](../img/info.jpg)
+
+成功训练之前能看到如下信息(具体参数可能不同)； Before successful training, you may see the following information with parameters that may differ:
+
+![Screenshot](../img/info2.jpg)
+  
+### 2. 可监督微调/Supervised Fine-tuning(SFT)
+#### Step 1: 修改参数/Modify Parameters
+* `cd /examples/aquila`
+* 配置`hostfile`文件, 参考[这里](../../../doc_zh/TUTORIAL_8_ENVIRONMENT_SETUP.md#a配置hostfilehostfile-中的v100-1-与sshconfig-对应) ; Configure the `hostfile` file, refer to [here](../../../docs/TUTORIAL_8_ENVIRONMENT_SETUP.md)
+* 配置`bmtrain_mgpu.sh`文件, 将`SCRIPT_FILE`改成`aquila_pretrain.py`; configure the `bmtrain_mgpu.sh` file, change `SCRIPT_FILE` to `aquila_pretrain.py`
+* (可选) 在`Aquila-pretrain.yaml`文件里更改参数 ; (optional) change parameters in `Aquila-pretrain.yaml`
+
+
+
+#### Step 2: 启动可监督微调/Start SFT
+```
+cd ../Aquila-sft/
+```
+对于Aquila-7B模型：
+```
+bash dist_trigger_docker.sh hostfile Aquila-sft.yaml aquila-7b [实验名 experiment name]
+```
+对于Aquila-33B模型:
+```
+bash dist_trigger_docker.sh hostfile Aquila-sft.yaml aquila-33b [实验名 experiment name]
+```
+接下来会输出下列信息，注意`NODES_NUM`应该与节点数相等，`LOGFILE`是模型运行的日志文件；The following information will be output. Note that `NODES_NUM` should be equal to the number of nodes, and `LOGFILE` is the log file for the model run.
+
+![Screenshot](../img/info.jpg)
+
+成功训练之前能在日志里看到如下信息(具体参数可能不同)； Before successful training, you may see the following information in the log file with parameters that may differ:
+
+![Screenshot](../img/info2.jpg)
+
+### 3. 推理/Inference
+
+```python
+import os
+import torch
+from flagai.auto_model.auto_loader import AutoLoader
+from flagai.model.predictor.predictor import Predictor
+from flagai.data.tokenizer import Tokenizer
+import bminf
+
+state_dict = "./checkpoints_in/"
+model_name = 'aquila-7b' # 'aquila-33b'
+
+loader = AutoLoader(
+    "lm",
+    model_dir=state_dict,
+    model_name=model_name,
+    use_cache=True)
+model = loader.get_model()
+tokenizer = loader.get_tokenizer()
+
+model.eval()
+model.half()
+model.cuda()
+
+predictor = Predictor(model, tokenizer)
+
+text = "北京在哪儿?"
+text = f'{text}' 
+print(f"text is {text}")
+with torch.no_grad():
+    out = predictor.predict_generate_randomsample(text, out_max_length=200, temperature=0)
+    print(f"pred is {out}")
+
+```
+
+
+
+
+## 证书/License
+
+Aquila-7B和Aquila-33B开源模型使用 [智源Aquila系列模型许可协议](https://huggingface.co/BAAI/AquilaCode-7B-NV/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf), 原始代码基于[Apache Licence 2.0](https://www.apache.org/licenses/LICENSE-2.0)。
+
+
+Aquila-7B and Aquila-33B open-source model is licensed under [ BAAI Aquila Model Licence Agreement](https://huggingface.co/BAAI/AquilaCode-7B-NV/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf). The source code is under [Apache Licence 2.0](https://www.apache.org/licenses/LICENSE-2.0)
diff --git a/examples/Aquila/aquila_pretrain.py b/examples/Aquila/aquila_pretrain.py
new file mode 100755
index 00000000..043688ca
--- /dev/null
+++ b/examples/Aquila/aquila_pretrain.py
@@ -0,0 +1,130 @@
+# Copyright © 2022 BAAI. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License")
+import os
+import torch
+from torch.utils.data import Dataset
+import gc
+gc.collect()
+torch.cuda.empty_cache()
+from flagai.auto_model.auto_loader import AutoLoader
+from flagai.data.tokenizer import Tokenizer
+from flagai.env_args import EnvArgs
+from flagai.env_trainer_v1 import EnvTrainer
+from flagai.model.aquila_model import AQUILAModel
+from flagai.data.datasets.indexed_dataset.build_index_mappings import _build_train_valid_test_datasets,_build_train_valid_test_weighted_datasets
+
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# You can input all parameters by the command line.
+# For example: python train_env_trainer.py --epochs=300 --batch_size=4 --env_type=pytorch
+env_args = EnvArgs(
+    env_type="bmtrain",
+    experiment_name="aquila",
+    batch_size=1,
+    gradient_accumulation_steps=1,
+    lr=2e-4,
+    weight_decay=1e-3,
+    epochs=100,
+    log_interval=10,
+    eval_interval=5000,
+    num_gpus=1,
+    load_dir=None,
+    pytorch_device=device,
+    save_dir="checkpoints_aquila",
+    checkpoint_activations=False,
+    save_interval=5000,
+    fp16=True,
+    training_script=__file__,
+)
+env_args = env_args.parse_args()
+#env_args.wandb = False
+
+# overwrite
+if env_args.yaml_config:
+    import yaml
+    file_data = open(env_args.yaml_config, 'r', encoding="utf-8").read()
+    data = yaml.load_all(file_data)
+    delattr(env_args, 'yaml_config')
+    arg_dict = env_args.__dict__
+    for subdata in data:
+        for key, value in subdata.items():
+            if isinstance(value, list):
+                for v in value:
+                    arg_dict[key].append(v)
+            else:
+                arg_dict[key] = value
+trainer = EnvTrainer(env_args)
+
+# Trainer as Trigger
+if not env_args.not_call_launch:
+    import sys
+    sys.exit(0)
+
+print(f"Trainer effective env_args={env_args} local_rank={trainer.local_rank}", flush=True)
+checkpoints = env_args.pre_load_dir
+model_name = env_args.model_name
+
+print('*'*20, "model_name", model_name, flush=True)
+
+cache_dir = os.path.join(checkpoints, model_name)
+print('*'*20, "cache_dir", cache_dir)
+tokenizer = Tokenizer.from_pretrained(model_name, cache_dir=cache_dir)
+print('*'*20, "tokenizer", tokenizer)
+
+# avoid sync loading models in case of Mem OOM
+if env_args.bmt_async_load:
+    import time
+    time.sleep(10*60*(trainer.local_rank%4))
+
+config_file = os.path.join(cache_dir, 'config.json')
+model = AQUILAModel.init_from_json(config_file=config_file)
+
+## bmt_pre_load
+checkpoint_path = os.path.join(cache_dir, "pytorch_model.bin")
+if env_args.bmt_pre_load:
+    model.load_weights(checkpoint_path)
+
+trainer.pre_train(model)
+
+print('*'*20, "model", model, flush=True)
+
+## Use Prebuilt DataSets
+data_prefix = '../../indexed_dataset/data/demo_text_document'
+data_impl = 'mmap'
+splits_string = '90,10'
+train_valid_test_num_samples = [90, 10]
+seq_length = 1024
+seed = 2023
+skip_warmup = True
+
+train_dataset, valid_dataset, _ = _build_train_valid_test_datasets(
+    data_prefix, data_impl, splits_string,
+    train_valid_test_num_samples,
+    seq_length, seed, skip_warmup)
+print("Total train_dataset: ", len(train_dataset), flush=True)
+print("Total valid_dataset: ", len(valid_dataset), flush=True)
+
+def collate_fn(batch):
+    def padding(indice, max_length, pad_idx=tokenizer.token_end_id):
+        pad_indice = [
+            item.tolist() + [pad_idx] * max(0, max_length - len(item.tolist())) for item in indice
+        ]
+        return torch.tensor(pad_indice)
+
+    input_ids = [data["input_ids"] for data in batch]
+    max_length = max([len(t) for t in input_ids])
+    input_ids = padding(input_ids, max_length)[:,:seq_length]
+
+    data = {
+        "input_ids": input_ids,
+        "labels": input_ids
+    }
+    return data
+
+trainer.do_train(
+    train_dataset=train_dataset,
+    valid_dataset=None,
+    collate_fn=collate_fn,
+    optimizer=None,
+    rank_split=False)
diff --git a/examples/Aquila/bmtrain_mgpu.sh b/examples/Aquila/bmtrain_mgpu.sh
new file mode 100755
index 00000000..f0b25e27
--- /dev/null
+++ b/examples/Aquila/bmtrain_mgpu.sh
@@ -0,0 +1,91 @@
+# Defined by User
+export TRIGGER_FILE=bmtrain_mgpu.sh
+export SCRIPT_FILE=aquila_pretrain.py
+
+# ENVS
+export PROJ_HOME=$PWD
+export PRE_LOAD_DIR=$PROJ_HOME/checkpoints_in
+export NCCL_SOCKET_IFNAME=eth0
+export NCCL_IB_DISABLE=0
+export NCCL_IB_CUDA_SUPPORT=1
+export NCCL_IB_GID_INDEX=0
+export NCCL_IB_HCA=mlx5_2,mlx5_5
+export NCCL_DEBUG=debug
+export OMP_NUM_THREADS=4
+
+echo "[INFO] $0: hostfile configfile model_name exp_name exp_version"
+set -u
+  hostfile=$1
+  configfile=$2
+  model_name=$3
+  exp_name=$4
+  exp_version=$5
+set +u
+
+# DIST
+export HOSTFILE=$hostfile
+export CONFIGFILE=$configfile
+export NODE_ADDR=$(ifconfig -a|grep inet|grep -v 127.0.0.1|grep -v inet6|awk '{print $2;}'|tr -d "addr:")
+export GPU_NUM_PER_NODE=$(awk -F" |=" '{ranks[$1]=$NF;}END{print ranks["'$NODE_ADDR'"];}' $HOSTFILE)
+export NODES_NUM=$(cat $HOSTFILE | wc -l)
+export MASTER_ADDR=$(head -n1 $HOSTFILE | awk '{print $1;}')
+export RANK=$(awk '{ranks[$1]=(FNR-1);}END{print ranks["'$NODE_ADDR'"];}' $HOSTFILE)
+export MASTER_PORT=23456
+
+
+## wandb
+export WANDB_MODE=offline
+
+## EXP
+export MODEL_NAME=$model_name
+export EXP_NAME=$exp_name
+export WANDB_DIR=$PROJ_HOME/wandb/${EXP_NAME}/$exp_version
+mkdir -p $PROJ_HOME/checkpoints_out
+export SAVE_DIR=$PROJ_HOME/checkpoints_out/${EXP_NAME}/$exp_version
+mkdir -p $SAVE_DIR
+mkdir -p $WANDB_DIR
+## Backup ckpts & scripts into exp versions
+cp -r $PRE_LOAD_DIR/$MODEL_NAME $SAVE_DIR
+cp -r $PROJ_HOME/$TRIGGER_FILE $SAVE_DIR
+cp -r $hostfile $SAVE_DIR
+cp -r $configfile $SAVE_DIR
+
+export EPOCH_NUM=1
+export BATCH_SIZE=6
+export GRADIENT_ACCUM_STEPS=1
+export LR=3.0e-4
+export LR=1.0e-5
+export LR=6.0e-5
+export WARMUP_RATE=0.008
+export WARMUP_RATE=0.02
+export WARMUP_RATE=0.1
+export WARMUP_RATE=0.2
+
+## EXTRA OPTS
+OPTS=" --batch_size $BATCH_SIZE \
+       --epochs $EPOCH_NUM \
+       --gradient_accumulation_steps $GRADIENT_ACCUM_STEPS \
+       --lr $LR \
+       --warm_up $WARMUP_RATE \
+       --weight_decay 0.1 \
+       --adam_beta1 0.9 \
+       --adam_beta2 0.95 \
+       --save_dir $SAVE_DIR \
+       --pre_load_dir $PRE_LOAD_DIR \
+       --experiment_name $EXP_NAME \
+       --model_name $MODEL_NAME \
+       --wandb_dir $WANDB_DIR \
+       --yaml_config $CONFIGFILE"
+
+## Trigger job on Each Node when bmt or ddp.
+
+mkdir -p $PRE_LOAD_DIR
+python -m torch.distributed.launch \
+       --nproc_per_node $GPU_NUM_PER_NODE \
+       --nnodes $NODES_NUM \
+       --node_rank $RANK \
+       --master_addr $MASTER_ADDR \
+       --master_port $MASTER_PORT \
+       $SCRIPT_FILE \
+       --not_call_launch \
+       $OPTS
\ No newline at end of file
diff --git a/examples/Aquila/generate.py b/examples/Aquila/generate.py
new file mode 100755
index 00000000..1a93c924
--- /dev/null
+++ b/examples/Aquila/generate.py
@@ -0,0 +1,36 @@
+import os
+import torch
+from flagai.auto_model.auto_loader import AutoLoader
+from flagai.model.predictor.predictor import Predictor
+from flagai.data.tokenizer import Tokenizer
+import bminf
+
+state_dict = "./checkpoints_in/"
+model_name = 'aquila-7b'
+
+loader = AutoLoader(
+    "lm",
+    model_dir=state_dict,
+    model_name=model_name,
+    use_cache=True)
+model = loader.get_model()
+tokenizer = loader.get_tokenizer()
+
+model.eval()
+model.half()
+
+model.cuda()
+
+predictor = Predictor(model, tokenizer)
+
+texts = [
+        "汽车EDR是什么",
+        ]
+
+for text in texts:
+    print('-'*80)
+    text = f'{text}' 
+    print(f"text is {text}")
+    with torch.no_grad():
+        out = predictor.predict_generate_randomsample(text, out_max_length=200,top_p=0.95)
+        print(f"pred is {out}")
diff --git a/examples/Aquila/hostfile b/examples/Aquila/hostfile
new file mode 100755
index 00000000..7a8d88a8
--- /dev/null
+++ b/examples/Aquila/hostfile
@@ -0,0 +1 @@
+192.168.21.2 slots=4
diff --git a/setup.py b/setup.py
index 918b52d3..8f1e804d 100755
--- a/setup.py
+++ b/setup.py
@@ -5,7 +5,7 @@
 
 setup(
     name="flagai",
-    version="v1.6.3",
+    version="v1.7.0",
     description="FlagAI aims to help researchers and developers to freely train and test large-scale models for NLP/CV/VL tasks.",
     long_description=open("README.md", encoding="utf-8").read(),
     long_description_content_type="text/markdown",