slurm script for: configs/official/OLMo-7B.yaml #699

andymvp2018 · 2024-08-13T17:14:06Z

❓ The question

do you know the slurm script for configs/official/OLMo-7B.yaml?
looking for multi-node slurm script

2015aroras · 2024-08-13T21:40:43Z

I'm not sure what exact script was used, but something like https://github.com/allenai/OLMo/blob/main/scripts/lumi/mitchish70.sh may be adaptable to your purposes. That script does not set an architecture-related settings.

andymvp2018 · 2024-08-13T21:43:00Z

Thanks @2015aroras , two questions:

If I set micro_train_device batch size, this will over-ride the global batch size right?
what are these?

B"$PROJECT_DIR:$PROJECT_DIR"
-B"$FLASH_DIR:$FLASH_DIR"
-B"$SCRATCH_DIR:$SCRATCH_DIR"
-B /opt/cray:/opt/cray
-B /usr/lib64/libcxi.so.1:/usr/lib64/libcxi.so.1
-B /usr/lib64/libjson-c.so.3:/usr/lib64/libjson-c.so.3
$PROJECT_DIR/containers/$OLMO_CONTAINER \

2015aroras · 2024-08-13T23:57:01Z

The global batch size carries the size of the batch in the current step. We split a batch across our GPUs, so each device has a smaller 'device' batch size (global size / num devices). A GPU doesn't have enough memory to do the whole device batch in 1 forward + backward pass, so we split the device into multiple micro batches and do separate forward + backward passes. After all the micro batches are done, we do the optimizer step.
Overall, micro batch size is just about avoiding memory issues & getting good perf; it should not affect training results. You'll want the micro batch size to be a divisor of the device batch size.
Our slurm jobs run in singularity containers (maybe there are ways to use other types of containers in your system). The -B is mounting directories from outside the container into the container. $PROJECT_DIR/containers/$OLMO_CONTAINER is the location of the container

andymvp2018 added the type/question An issue that's a question label Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slurm script for: configs/official/OLMo-7B.yaml #699

slurm script for: configs/official/OLMo-7B.yaml #699

andymvp2018 commented Aug 13, 2024 •

edited

Loading

2015aroras commented Aug 13, 2024

andymvp2018 commented Aug 13, 2024

2015aroras commented Aug 13, 2024

slurm script for: configs/official/OLMo-7B.yaml #699

slurm script for: configs/official/OLMo-7B.yaml #699

Comments

andymvp2018 commented Aug 13, 2024 • edited Loading

❓ The question

2015aroras commented Aug 13, 2024

andymvp2018 commented Aug 13, 2024

2015aroras commented Aug 13, 2024

andymvp2018 commented Aug 13, 2024 •

edited

Loading