You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The global batch size carries the size of the batch in the current step. We split a batch across our GPUs, so each device has a smaller 'device' batch size (global size / num devices). A GPU doesn't have enough memory to do the whole device batch in 1 forward + backward pass, so we split the device into multiple micro batches and do separate forward + backward passes. After all the micro batches are done, we do the optimizer step.
Overall, micro batch size is just about avoiding memory issues & getting good perf; it should not affect training results. You'll want the micro batch size to be a divisor of the device batch size.
Our slurm jobs run in singularity containers (maybe there are ways to use other types of containers in your system). The -B is mounting directories from outside the container into the container. $PROJECT_DIR/containers/$OLMO_CONTAINER is the location of the container
❓ The question
do you know the slurm script for configs/official/OLMo-7B.yaml?
looking for multi-node slurm script
The text was updated successfully, but these errors were encountered: