Timeline:

18/10/24

Unable to test the model at all, I've tried running it but either it takes forever, or my data limit on BITS Wi-Fi gets exhausted, probably due to the huge model size for 70B parameters.
The code looks convincing enough, but for added confidence, it will require actual testing.

Updated the code to now automate GPU allocation, and now supports having multiple GPUs.
No need to manually load the model to GPUs anymore, it is automated by load_check_and_dispatch from accelerate library.
Still haven't tested the code, so again, unsure if it'll actually work.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.env		.env
PipelineParallelism.py		PipelineParallelism.py
README.md		README.md