Skip to content

Commit

Permalink
Merge pull request #75 from mikkaatje/patch-1
Browse files Browse the repository at this point in the history
Improved English README
  • Loading branch information
ZhaoFancy authored Nov 9, 2023
2 parents 6d68516 + 60bd7b9 commit fd6b989
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 9 deletions.
2 changes: 1 addition & 1 deletion demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ python text_generation.py \

You can also provide an extra `--prompt` argument to try some other prompts.

When dealing with extreme long input sequence, you may need multiple GPU devices and to enable tensor parallelism acceleration during inference to avoid insufficient memory error.
When dealing with extremely long input sequences, you may need multiple GPU devices and to enable tensor parallelism acceleration during inference to avoid insufficient memory error.

To run text generation task using tensor parallelism acceleration with 2 GPU devices:

Expand Down
16 changes: 8 additions & 8 deletions finetune/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,15 @@ pip install torch==2.0.1 deepspeed==0.10 tensorboard transformers datasets sente

## Hardware Setup

For Yi-6B model, a node with 4 GPUs, each has GPU mem larger than 60GB is recommended.
For the Yi-6B model, a node with 4 GPUs, each has GPU mem larger than 60GB is recommended.

For Yi-34B model, because the usage of zero-offload technique takes a lot CPU mem, please be careful to limit the GPU numbers in 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the GPU number (as shown in scripts/run_sft_Yi_34b.sh).
For the Yi-34B model, because the usage of zero-offload technique takes a lot CPU memory, please be careful to limit the GPU numbers in 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the GPU number (as shown in scripts/run_sft_Yi_34b.sh).

A typical hardware setup for finetuning 34B model is a node with 8GPUS (limit to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each has GPU mem larger than 80GB, with total CPU mem larger than 900GB.

## Quick Start

Download a LLM-base model to MODEL_PATH (6B and 34B). A typical folder of model is like:
Download a LLM-base model to MODEL_PATH (6B and 34B). A typical folder of models is like:

```bash
|-- $MODEL_PATH
Expand All @@ -80,7 +80,7 @@ Download a dataset from huggingface to local storage DATA_PATH, e.g. Dahoas/rm-s
| |-- README.md
```

`finetune/yi_example_dataset` has example datasets, which is modified from [BAAI/COIG](https://huggingface.co/datasets/BAAI/COIG)
`finetune/yi_example_dataset` has example datasets, which are modified from [BAAI/COIG](https://huggingface.co/datasets/BAAI/COIG)

```bash
|-- $DATA_PATH
Expand All @@ -89,17 +89,17 @@ Download a dataset from huggingface to local storage DATA_PATH, e.g. Dahoas/rm-s
|-- eval.jsonl
```

`cd` into scripts folder, copy and paste the script and run. For example:
`cd` into the scripts folder, copy and paste the script, and run. For example:

```bash
cd finetune/scripts

bash run_sft_Yi_6b.sh
```

For Yi-6B base model, setting training_debug_steps=20 and num_train_epochs=4 can output a chat model, which takes about 20 minutes.
For the Yi-6B base model, setting training_debug_steps=20 and num_train_epochs=4 can output a chat model, which takes about 20 minutes.

For Yi-34B base model, it takes a relatively long time for initialization. Please be patient.
For the Yi-34B base model, it takes a relatively long time for initialization. Please be patient.

## Evaluation

Expand All @@ -109,4 +109,4 @@ cd finetune/scripts
bash run_eval.sh
```

Then you'll see the answer from both base model and finetuned model
Then you'll see the answer from both the base model and the finetuned model

0 comments on commit fd6b989

Please sign in to comment.