You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`--load-pretrained-model`: Whether to load model weights. Usually `true`.
32
-
-`--task-name`: The task name or the path of a `jsonl` file. For multi-task training separate task names by `,`.
33
-
There is an optional sampling weight after each task name, separated by `:` (default is 1.0). Sampling weights will be normalized.
32
+
-`--task-name`: The task name or the path of a `jsonl` file. For multi-task training separate task names by `,`.
33
+
There is an optional sampling weight after each task name, separated by `:` (default is 1.0). Sampling weights will be normalized.
34
34
E.g. it should be like `--task-name cot:0.1,/path_task0.jsonl:1.0,/path_task0.jsonl:1.0,/path_task0.jsonl:1.0`.
35
+
The number after the colon indicates the sampling weight for the task during training. For example, `cot:0.1` means the `cot` task will be sampled with a weight of 0.1.
35
36
-`--checkpoint-path`: Path to save fine-tuned checkpoints.
36
37
-`--checkpoint-steps`: Save ckpt every `checkpoint-steps`.
37
38
-`--total-steps`: Total number of steps for training. (This counts all `gradient-accumulate-step`s.)
@@ -48,3 +49,9 @@ The following arguments usually do not change:
48
49
-`--fp16`: Flag to enable FP16 mixed precision training. Should always adding it for the current impl.
49
50
-`--pp-mode`: always `gpipe`
50
51
-`--profiling`: {no-profiling, tidy_profiling}. `tidy_profiling` will generate profile jsons.
52
+
53
+
## Adding Your Own Data to the DATASETS
54
+
55
+
To add your own data to the training process, you should create a `jsonl` file where each line is a JSON object representing a single training example. Once you have your `jsonl` file, you can include it in the `--task-name` argument with an appropriate sampling weight. For instance, if your file is located at `/path_to_your_data/your_data.jsonl` and you wish to give it a sampling weight of 0.5, you would add `/path_to_your_data/your_data.jsonl:0.5` to the `--task-name` argument.
56
+
57
+
If you have any questions or need further assistance, please refer to the [OpenDataHub](https://github.com/togethercomputer/OpenDataHub) repository or contact us through our [website](https://www.together.ai/contact).
0 commit comments