0.18.4rc3
Pre-release
Pre-release
peterschmidt85
released this
26 Jun 14:49
·
380 commits
to master
since this release
This is a preview build of the upcoming 0.18.4
release. See below to see what's new.
TPU
One of the major new features in this update is the initial support for Google Cloud TPU.
To request a TPU, you simply need to specify the system architecture of the required TPU prefixed by tpu-
in gpu
:
type: task
python: "3.11"
commands:
- pip install torch~=2.3.0 torch_xla[tpu]~=2.3.0 torchvision -f https://storage.googleapis.com/libtpu-releases/index.html
- git clone --recursive https://github.com/pytorch/xla.git
- python3 xla/test/test_train_mp_imagenet.py --fake_data --model=resnet50 --num_epochs=1
resources:
gpu: tpu-v2-8
Important
You cannot request multiple nodes (for running parallel on multiple TPU devices) for tasks. This feature is coming soon.
You're very welcome to try the initial support and share your feedback.
Major bug-fixes
Besides TPU, the update fixes a few important bugs.
- Fix
cudo
backend stuck && Improve docs forcudo
by @smokfyz in #1347 - Fix
nvidia-smi
not available onlambda
by @r4victor in #1357 - Respect
registry_auth
for RunPod by @smokfyz in #1333 - Support multi-node tasks on
oci
by @jvstme in #1334
Other
- Show warning on required
ssh
version by @loghijiaha in #1313 - Add OCI packer templates by @jvstme in #1316
- Support
oci
Bare Metal instances by @jvstme in #1325 - Support
oci
BM.Optimized3.36
instance by @jvstme in #1328 - [Docs] Update
dstack pool
docs by @jvstme in #1329 - Add TPU support in
gcp
by @Bihan in #1323 - Fix failing
runner-test
workflow by @r4victor in #1336 - Document OCI permissions by @jvstme in #1338
- Limit the gateway's open ports to
22
,80
, and443
by @smokfyz in #1335 - Update
serve.dstack.yml
- infinity by @michaelfeil in #1340 - Support instances without public IP for GCP by @smokfyz in #1341
- [Internal] Automate OCI images publishing by @jvstme in #1346
- Fix slow
/api/pools/list_instances
by @r4victor in #1320 - Respect
gcp
VPC config when provisioning TPUs by @r4victor in #1332 - [Internal] Fix linter errors by @jvstme in #1322
- TPU support enhancements by @r4victor in #1330
- TPU initial release by @Bihan in #1354
- TPUs fixes by @r4victor in #1360
- Minor refactoring to support custom backends in dstack Sky by @r4victor in #1319
- Even more flexible OCI client credentials by @jvstme in #1317
New contributors
- @loghijiaha made their first contribution in #1313
- @smokfyz made their first contribution in #1333
- @michaelfeil made their first contribution in #1340
Full changelog: 0.18.3...0.18.4rc3