Skip to content

Releases: dstackai/dstack

0.18.8

01 Aug 10:02
fcbf4df
Compare
Choose a tag to compare

GCP volumes

#1477 added support for gcp volumes:

type: volume
name: my-gcp-volume
backend: gcp
region: europe-west1
size: 100GB

Previously, volumes were only supported for aws and runpod.

Major bugfixes

#1486 fixed a major bug introduced in 0.18.7 that could lead to instances not being terminated in the cloud.

Other

New Contributors

Full Changelog: 0.18.7...0.18.8

0.18.7

29 Jul 12:54
02decfe
Compare
Choose a tag to compare

Fleets

With fleets, you can now describe clusters declaratively and create them in both cloud and on-prem with a single command. Once a fleet is created, it can be used with dev environments, tasks, and services.

Cloud fleets

To provision a fleet in the cloud, specify the required resources, number of nodes, and other optional parameters.

type: fleet
name: my-fleet
placement: cluster
nodes: 2
resources:
  gpu: 24GB

On-prem fleets

To create a fleet from on-prem servers, specify their hosts along with the user, port, and SSH key for connection via SSH.

type: fleet
name: my-fleet
placement: cluster
ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 3.255.177.51
    - 3.255.177.52

To create or update the fleet, simply call the dstack apply command:

dstack apply -f examples/fleets/my-fleet.dstack.yml

Learn more about fleets in the documentation.

Deprecating dstack run

Now that we support dstack apply for gateways, volumes, and fleets, we have extended this support to dev environments, tasks, and services. Instead of using dstack run WORKING_DIR -f CONFIG_FILE, you can now use dstack apply -f CONFIG_FILE.

Also, it's now possible to specify a name for dev environments, tasks, and services, just like for gateways, volumes, and fleets.

type: dev-environment
name: my-ide

python: "3.11"

ide: vscode

resources:
  gpu: 80GB

This name is used as a run name and is more convenient than a random name. However, if you don't specify a name, dstack will assign a random name as before.

RunPod Volumes

In other news, we've added support for volumes in the runpod backend. Previously, they were only supported in the aws backend.

type: volume
name: my-new-volume

backend: runpod
region: ca-mtl-3
size: 100GB

A great feature of the runpod's volumes is their ability to attach to multiple instances simultaneously. This allows for persisting cache across multiple service replicas or supporting distributed training tasks.

Major bugfixes

Important

This update fixes the broken kubernetes backend, which has been non-functional since a few previous updates.

Other

New contributors

** Full changelog**: 0.18.6...0.18.7

0.18.7rc2

26 Jul 12:02
Compare
Choose a tag to compare
0.18.7rc2 Pre-release
Pre-release

This is a preview build of the upcoming 0.18.7 update, bringing a few major new features and many bug fixes.

Fleets

Important

With fleets, you can now describe clusters declaratively and create them in both cloud and on-prem with a single command. Once a fleet is created, it can be used with dev environments, tasks, and services.

Cloud fleets

To provision a fleet in the cloud, specify the required resources, number of nodes, and other optional parameters.

type: fleet
name: my-fleet
placement: cluster
nodes: 2
resources:
  gpu: 24GB

On-prem fleets

To create a fleet from on-prem servers, specify their hosts along with the user, port, and SSH key for connection via SSH.

type: fleet
name: my-fleet
placement: cluster
ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 3.255.177.51
    - 3.255.177.52

To create or update the fleet, simply call the dstack apply command:

dstack apply -f examples/fleets/my-fleet.dstack.yml

Learn more about fleets in the documentation.

Deprecating dstack run

Important

Now that we support dstack apply for gateways, volumes, and fleets, we have extended this support to dev environments, tasks, and services. Instead of using dstack run WORKING_DIR -f CONFIG_FILE, you can now use dstack apply -f CONFIG_FILE.

Also, it's now possible to specify a name for dev environments, tasks, and services, just like for gateways, volumes, and fleets.

type: dev-environment
name: my-ide

python: "3.11"

ide: vscode

resources:
  gpu: 80GB

This name is used as a run name and is more convenient than a random name. However, if you don't specify a name, dstack will assign a random name as before.

RunPod Volumes

Important

In other news, we've added support for volumes in the runpod backend. Previously, they were only supported in the aws backend.

type: volume
name: my-new-volume

backend: runpod
region: ca-mtl-3
size: 100GB

A great feature of the runpod's volumes is their ability to attach to multiple instances simultaneously. This allows for persisting cache across multiple service replicas or supporting distributed training tasks.

Major bugfixes

Important

This update fixes the broken kubernetes backend, which has been non-functional since a few previous updates.

Other

  • [UX] Make --gpu override YAML's gpu by @r4victor in #1455
    #1431
  • [Performance] Speed up listing runs for Python API and CLI by @r4victor in #1430
  • [Performance] Speed up project loading by @r4victor in #1425
  • [Bugfix] Remove busy offers from the top of offers list by @jvstme in #1452
  • [Bugfix] Prioritize cheaper offers from the pool by @jvstme in #1453
  • [Bugfix] Fix spot offers suggested for on-demand dev envs by @jvstme in #1450
  • [Feature] Implement dstack volume delete by @r4victor in #1434
  • [UX] Instances were always shown as provisioning for container backends by @r4victor in * [Docs] Fix typos by @jvstme in #1426
  • [Docs] Fix a bad link by @tamanobi in #1422
  • [Internal] Add DSTACK_SENTRY_PROFILES_SAMPLE_RATE by @r4victor in #1428
  • [Internal] Update ruff to 0.5.3 by @jvstme in #1421
  • [Internal] Update GitHub Actions dependencies by @jvstme in #1436

New contributors

** Full changelog**: 0.18.6...0.18.7rc2

0.18.6

18 Jul 14:44
50d6d41
Compare
Choose a tag to compare

Major fixes

  • Support for GitLab's authorization when the repo is using HTTP/HTTPS by @jvstme in #1412
  • Add a multi-node example to the Hugging Alignment Handbook example by @deep-diver in #1409
  • Fix the issue where idle instances weren't offered (occurred when a GPU name was in upper case). by @jvstme in #1417
  • Fix the issue where an exception is thrown for non-standard Git repo host URLs using HTTP/HTTPS @jvstme in #1410
  • Support H100 with the gcp backend by @jvstme in #1405

Warning

If you have idle instances in your pool, it is recommended to re-create them after upgrading to version 0.18.6. Otherwise, there is a risk that these instances won't be able to execute jobs.

Other

Full changelog: 0.18.5...0.18.6

0.18.5

12 Jul 13:09
Compare
Choose a tag to compare

Read below about its new features and bug-fixes.

Volumes

When you run anything with dstack, it allows you to configure the disk size. However, once the run is finished, if you haven't stored your data in any external storage, all the data on disk will be erased. With 0.18.5, we're adding support for network volumes that allow data to persist across runs.

Once you've created a volume (e.g. named my-new-volume), you can attach it to a dev environment, task, or service.

type: dev-environment
ide: vscode
volumes:
  - name: my-new-volume
    path: /volume_data

The data stored in the volume will persist across runs.

dstack allows you to create new volumes and register existing ones. To learn more about how volumes work, check out the docs.

Important

Volumes are currently experimental and only work with the aws backend. Support for other backends is coming soon.

PostgreSQL

By default, dstack stores its state in ~/.dstack/server/data using SQLite. With this update, it's now possible to configure dstack to store its state in PostgreSQL. Just pass the DSTACK_DATABASE_URL environment variable.

DSTACK_DATABASE_URL="postgresql+asyncpg://myuser:mypassword@localhost:5432/mydatabase" dstack server

Important

Despite PostgreSQL support, dstack still requires that you run only one instance of the dstack server. However, this requirement will be lifted in a future update.

On-prem clusters

Previously, dstack didn't allow the use of on-prem clusters (added via dstack pool add-ssh) if there were no backends configured. This update fixes that bug. Now, you don't have to configure any backends if you only plan to use on-prem clusters.

Supported GPUs

Previously, dstack didn't support L4 and H100 GPUs with AWS. Now you can use them.

Full changelog

See more: 0.18.4...0.18.5

0.18.5rc1

08 Jul 11:06
Compare
Choose a tag to compare
0.18.5rc1 Pre-release
Pre-release

This is a release candidate build of the upcoming 0.18.5 release. Read below to learn about its new features and bug-fixes.

Volumes

When you run anything with dstack, it allows you to configure the disk size. However, once the run is finished, if you haven't stored your data in any external storage, all the data on disk will be erased. With 0.18.5, we're adding support for network volumes that allow data to persist across runs.

Once you've created a volume (e.g. named my-new-volume), you can attach it to a dev environment, task, or service.

type: dev-environment
ide: vscode
volumes:
  - name: my-new-volume
    path: /volume_data

The data stored in the volume will persist across runs.

dstack allows you to create new volumes and register existing ones. To learn more about how volumes work, check out the docs.

Important

Volumes are currently experimental and only work with the aws backend. Support for other backends is coming soon.

PostgreSQL

By default, dstack stores its state in /root/.dstack/server/data using SQLite. With this update, it's now possible to configure dstack to store its state in PostgreSQL. Just pass the DSTACK_DATABASE_URL environment variable.

DSTACK_DATABASE_URL="postgresql+asyncpg://myuser:mypassword@localhost:5432/mydatabase" dstack server

Important

Despite PostgreSQL support, dstack still requires that you run only one instance of the dstack server. However, this requirement will be lifted in a future update.

On-prem clusters

Previously, dstack didn't allow the use of on-prem clusters (added via dstack pool add-ssh) if there were no backends configured. This update fixes that bug. Now, you don't have to configure any backends if you only plan to use on-prem clusters.

Supported GPUs

Previously, dstack didn't support L4 and H100 GPUs with AWS. Now you can use them.

Full changelog

See more: 0.18.4...0.18.5rc1

0.18.4

27 Jun 12:14
f6395c6
Compare
Choose a tag to compare

Google Cloud TPU

This update introduces initial support for Google Cloud TPU.

To request a TPU, specify the TPU architecture prefixed by tpu- (in gpu under resources):

type: task

python: "3.11"

commands:
  - pip install torch~=2.3.0 torch_xla[tpu]~=2.3.0 torchvision -f https://storage.googleapis.com/libtpu-releases/index.html
  - git clone --recursive https://github.com/pytorch/xla.git
  - python3 xla/test/test_train_mp_imagenet.py --fake_data --model=resnet50 --num_epochs=1

resources:
  gpu:  tpu-v2-8

Important

Currently, you can't specify other than 8 TPU cores. This means only single TPU device workloads are supported. Support for multiple TPU devices is coming soon.

Private subnets with GCP

Additionally, the update allows configuring the gcp backend to use only private subnets. To achieve this, set public_ips to false.

projects:
  - name: main
    backends:
      - type: gcp
        creds:
          type: default

        public_ips: false

Major bug-fixes

Besides TPU, the update fixes a few important bugs.

Other

New contributors

Full changelog: 0.18.3...0.18.4

0.18.4rc3

26 Jun 14:49
3e89218
Compare
Choose a tag to compare
0.18.4rc3 Pre-release
Pre-release

This is a preview build of the upcoming 0.18.4 release. See below to see what's new.

TPU

One of the major new features in this update is the initial support for Google Cloud TPU.

To request a TPU, you simply need to specify the system architecture of the required TPU prefixed by tpu- in gpu:

type: task

python: "3.11"

commands:
  - pip install torch~=2.3.0 torch_xla[tpu]~=2.3.0 torchvision -f https://storage.googleapis.com/libtpu-releases/index.html
  - git clone --recursive https://github.com/pytorch/xla.git
  - python3 xla/test/test_train_mp_imagenet.py --fake_data --model=resnet50 --num_epochs=1

resources:
  gpu:  tpu-v2-8

Important

You cannot request multiple nodes (for running parallel on multiple TPU devices) for tasks. This feature is coming soon.

You're very welcome to try the initial support and share your feedback.

Major bug-fixes

Besides TPU, the update fixes a few important bugs.

Other

New contributors

Full changelog: 0.18.3...0.18.4rc3

0.18.3

06 Jun 10:55
Compare
Choose a tag to compare

Oracle Cloud Infrastructure

With the new update, it is now possible to run workloads with your Oracle Cloud Infrastructure (OCI) account. The backend is called oci and can be configured as follows:

projects:
  - name: main
    backends:
      - type: oci
        creds:
          type: default

The supported credential types include default and client. In case default is used, dstack automatically picks the default OCI credentials from ~/.oci/config.

Just like other backends, oci supports dev environments, tasks, and services:

Note

Support for spot instances, multi-node tasks, and gateways is coming soon.

Find more documentation on using Oracle Cloud Infrastructure on the reference page.

Retry policy

We have reworked how to configure the retry policy and how it is applied to runs. Here's an example:

type: task

commands: 
  - python train.py

retry:
  on_events: [no-capacity]
  duration: 2h

Now, if you run such a task, dstack will keep trying to find capacity within 2 hours. Once capacity is found, dstack will run the task.

The on_events property also supports error (in case the run fails with an error) and interruption (if the run is using a spot instance and it was interrupted).

Previously, dstack only allowed retries when spot instances were interrupted.

RunPod

Previously, the runpod backend only allowed the use of Docker images with /bin/bash or /bin/sh as the entrypoint. Thanks to the fix on the RunPod's side, dstack now allows the use of any Docker images.

Additionally, the runpod backend now also supports spot instances.

GCP

The gcp backend now also allows configuring VPCs:

projects:
  - name: main
    backends:
      - type: gcp

        project_id: my-awesome-project
        creds:
          type: default

        vpc_name: my-custom-vpc

The VPC should belong to the same project. If you would like to use a shared VPC from another project, you can also specify vpc_project_id.

AWS

Last but not least, for the aws backend, it is now possible to configure VPCs for selected regions and use the default VPC in other regions:

projects:
  - name: main
    backends:
      - type: aws
        creds:
          type: default

        vpc_ids:
          us-east-1: vpc-0a2b3c4d5e6f7g8h

        default_vpcs: true

You just need to set default_vpcs to true.

Other changes

0.18.3rc1

05 Jun 09:37
Compare
Choose a tag to compare
0.18.3rc1 Pre-release
Pre-release

OCI

With the new update, it is now possible to run workloads with your Oracle Cloud Infrastructure (OCI) account. The backend is called oci and can be configured as follows:

projects:
  - name: main
    backends:
      - type: oci
        creds:
          type: default

The supported credential types include default and client. In case default is used, dstack automatically picks the default OCI credentials from ~/.oci/config.

Warning

OCI support does not yet include spot instances, multi-node tasks, and gateways. These features will be added in upcoming updates.

Retry policy

We have reworked how to configure the retry policy and how it is applied to runs. Here's an example:

type: task

commands: 
  - python train.py

retry:
  on_events: [no-capacity]
  duration: 2h

Now, if you run such a task, dstack will keep trying to find capacity within 2 hours. Once capacity is found, dstack will run the task.

The on_events property also supports error (in case the run fails with an error) and interruption (if the run is using a spot instance and it was interrupted).

Previously, dstack only allowed retries when spot instances were interrupted.

VPC

GCP

The gcp backend now also allows configuring VPCs:

projects:
  - name: main
    backends:
      - type: gcp

        project_id: my-awesome-project
        creds:
          type: default

        vpc_name: my-custom-vpc

The VPC should belong to the same project. If you would like to use a shared VPC from another project, you can also specify vpc_project_id.

AWS

Last but not least, for the aws backend, it is now possible to configure VPCs for selected regions and use the default VPC in other regions:

projects:
  - name: main
    backends:
      - type: aws
        creds:
          type: default

        vpc_ids:
          us-east-1: vpc-0a2b3c4d5e6f7g8h

        default_vpcs: true

You just need to set default_vpcs to true.

Other changes

Full changelog: 0.18.2...0.18.3rc1

Warning

This is an RC build. Please report any bugs to the issue tracker. The final release is planned for later this week, and the official documentation and examples will be updated then.