Releases · dstackai/dstack

31 Oct 15:00

jvstme

0.18.22

19cd2fd

0.18.22

Custom OS images on AWS

You can now configure your own AMIs for the AWS backend.

projects:
- name: main
  backends:
  - type: aws
    creds:
      type: default
    os_images:
      cpu:
        name: my-cpu-ami
        user: admin
      nvidia:
        name: my-nvidia-ami
        user: ubuntu

This can be used as an alternative way to bring your software or data to the AWS instance and mount it into your runs using Instance volumes.

See the AWS backend reference for details on configuring OS images. Support for custom OS images in other backends is coming in future releases.

What's Changed

[Blog] Docker and Docker Compose inside container by @peterschmidt85 in #1916
[Examples] Update Chat UI compose.yaml by @un-def in #1919
[Bug]: [UI] Dark YAML editor theme won't work bug ui by @olgenn in #1923
Remove Cloud NAT check when provisioning by @r4victor in #1925
Allow to customize AMIs used by AWS backend by @un-def in #1920
Fix Azure hostname assignment by @r4victor in #1930
Support GCP Shared VPC for some subnets by @r4victor in #1933
Increase request body size limit for services by @jvstme in #1934

Full Changelog: 0.18.21...0.18.22

Contributors

un-def, olgenn, and 3 other contributors

Assets 2

30 Oct 11:11

r4victor

0.18.21

3858b8a

0.18.21

Instance volumes

In addition to network volumes, dstack now allows to mount instance (host) filesystems inside the run container. As contents of the instance volume are specific to the instance where the run is executed, such volumes can be used in cases where data persistence is not critical, for example, as a cache:

type: task

commands:
  - pip install -r requirements.txt

volumes:
  # reuse pip cache between runs
  - /dstack-cache/pip:/root/.cache/pip

See the instance volumes documentation for more information.

Azure custom and private networks

dstack now supports configuring custom Azure networks, which was only possible on AWS and GCP before. In addition, you can now configure dstack to provision instances without public IPs on Azure to take advantage of private networks:

type: azure
  tenant_id: my_tenant_id
  subscription_id: my_subscription_id
  regions: [westeurope]
  public_ips: false
  vpc_ids:
    westeurope: test-networks-rg/test-network
  creds:
    type: default

Read more about Azure networking configuration in the docs.

Python 3.13 support for `dstack` package

The previous 0.18.20 release added support for Python 3.13 in run configurations. This release updates the dstack package itself so that it works under Python 3.13. The dstack package also drops Python 3.8 support that reached end of life. Note that python: 3.8 in run configurations is deprecated but still supported.

Multi-job UI

The control plain UI now displays detailed info on each job in the run, improving support for multi-node tasks and replicated services:

What's Changed

Show all jobs in runs UI by @olgenn in #1887
Revert the list of projects and users in Administration by @olgenn in #1888
Fix instance price discrepancies in RunPod by @jvstme in #1891
[Website] Change Backends to Partners by @peterschmidt85 in #1893
Support auth: true services in dstack-proxy by @jvstme in #1885
[Docs]: Fix typos by @dheerajsir in #1897
Support custom and private networks for Azure by @r4victor in #1896
Fix Dstack Server Deployment Link by @SagarSharma101 in #1898
Add instance volumes by @un-def in #1895
Add nat_check option to GCP config by @r4victor in #1904
Handle deleted volumes in attach_volume() by @r4victor in #1907
Fix log message on getting run volumes by @r4victor in #1909
[dind] Improve start-dockerd script by @un-def in #1908
Show warning on missing backend deps by @r4victor in #1911
Support Python 3.13, drop Python 3.8 by @jvstme in #1910
Update pydantic-duality to fix the infinite recursion bug by @zmievsa in #1902
Set ping_interval on WebSocket connection by @r4victor in #1918

New Contributors

@dheerajsir made their first contribution in #1897
@SagarSharma101 made their first contribution in #1898
@zmievsa made their first contribution in #1902

Full Changelog: 0.18.20...0.18.21

Contributors

un-def, olgenn, and 6 other contributors

Assets 2

23 Oct 10:09

r4victor

0.18.20

d8b11a3

0.18.20

Python 3.13 support

Following a recent Python 3.13 release on October 7, 2024, dstack now supports python: 3.13 in run configurations. python: 3.8 is still supported but deprecated.

Note: the dstack package itself does not yet work on Python 3.13 due to some limitations in dependencies. We're looking into supporting it as well.

Custom backend tags

You can now define custom tags that dstack will assign to all cloud resources it creates including instances and volumes. The tags are defined in the backend configuration:

type: aws
tags:
  company_department: finance
  company_project: dstack
  company_user: victor
creds:
  type: default

Custom tags are supported for AWS, Azure, and GCP.

Improved support of AWS private subnets

Previously, when configuring an AWS backend to use private subnets (public_ips: false), dstack would require a NAT Gateway. Now dstack supports more networking setups that provide outbound internet traffic including NAT Gateway, Transit Gateway, and VPC Peering Connection.

New required permissions

dstack now sets labels on GCP volumes which requires a compute.disks.setLabels permission.

Deprecations

python: 3.8 in run configurations is deprecated.

What's Changed

Add created_at to projects and users by @r4victor in #1857
Improvements for model details page in the UI by @olgenn in #1860
[Bug]: Users logged out after rotating their tokens without seeing tokens by @olgenn in #1861
[dind] Move dind processes to a separate cgroup by @un-def in #1859
[Docs] Add Docker protip and Docker Compose example by @un-def in #1858
Implement custom backend tags by @r4victor in #1872
[shim] Remove anonymous volumes along associated container by @un-def in #1873
Allow running services without a gateway by @jvstme in #1869
[UX]: Resize chat input field based on content #1562 by @olgenn in #1875
Collect AMD GPU metrics by @r4victor in #1877
[Blog] Monitoring GPU usage and other container metrics by @peterschmidt85 in #1874
[Docs] Rename HUGGING_FACE_HUB_TOKEN to HF_TOKEN by @peterschmidt85 in #1871
Support Python 3.13 and deprecate 3.8 in run configurations by @jvstme in #1878
Support AWS private subnets with Transit Gateway by @r4victor in #1881
Fix collecting metrics from CPU instances by @r4victor in #1882

Full Changelog: 0.18.19...0.18.20

Contributors

un-def, olgenn, and 3 other contributors

Assets 2

17 Oct 07:25

r4victor

0.18.19

a171ba6

0.18.19

This release contains CLI hotfixes for 0.18.18, including a fix for client backward compatibility and a fix for reported memory usage in dstack stats. It's recommended to update the CLI from 0.18.18 to 0.18.19. The server update from 0.18.18 is not necessary.

What's Changed

Fix Trl example by @Bihan in #1851
Fix dstack client 0.18.18 compatibility with older servers by @un-def in #1850
[Docs]: Fix frontend build instructions by @jvstme in #1849
Show working set memory in dstack stats by @r4victor in #1856

Full Changelog: 0.18.18...0.18.19

Contributors

un-def, Bihan, and 2 other contributors

Assets 2

16 Oct 15:04

r4victor

0.18.18

10e587d

0.18.18

Hardware metrics

The CLI introduces a new command, dstack stats, which displays real-time hardware metrics for runs, including CPU,
memory, and GPU usage per replica and job.

$ dstack stats hot-frog-1
 NAME        CPU  MEMORY           GPU                        
 hot-frog-1  2%   15307MB/49152MB  #0 22764MB/24576MB 0% Util

Use the -w option to view stats updating every few seconds in the loop.

You can also retrieve the metrics using the REST API.

Docker inside dstack

Run configurations now have a new optional privileged property (equivalent to the
--privileged Docker CLI flag).
When it is set to true, the run container gets extended privileges, making it possible to use Docker inside
dstack.

To use Docker and Docker Compose within dstack, set the image property
to dstackai/dind. Additionally, you must invoke start-dockerd as the first command to start the Docker daemon.

type: task
name: misc-task-dind

image: dstackai/dind
privileged: true

commands:
  - start-dockerd
  - docker compose up

Dev environment example

type: dev-environment
name: vscode-dind

image: dstackai/dind
privileged: true
ide: vscode

init:
  - start-dockerd

See more examples in examples/misc/docker-compose.

Note

The privileged property is only supported by VM backends (all backends except runpod, vastai, and kubernetes).

What's changed

[Feature] Track hardware metrics by @r4victor in #1827
[Feature] Add privileged run property by @un-def in #1835
[Feature] Run container as --privileged if DSTACK_DOCKER_PRIVILEGED by @mtaran in #1821
[Feature] Added the Code button on the model page in UI by @olgenn in #1825
[Feature] Publish dstackai/dind image by @un-def in #1837
[Bugfix] Fix running NVIDIA NIM images by @jvstme in #1843
[Bugfix] Mitigate concurrent dstack attach issues by @un-def in #1816
[Bugfix] Fixed runs ordering and pagination in UI by @un-def in #1812
[Bugfix] Removed dummy Test fleet in UI by @priyanshuverma-dev in #1819
[UX] Display the SSH and vscode prompt to after executing init commands by @fool1280 in #1818
[Blog] AMD MI300X inference benchmark by @peterschmidt85 and @Bihan in #1808
[Docs] Add AWS EFA support info by @un-def in #1803
[Docs] Fixed typos and added minor enhancements by @FarukhS52 in #1810 and #1811
[Docs] Fixed typos and added minor enhancements by @amantyagiprojects in #1800
[Docs] Use correct discord server invite in readme by @sravan1946 in #1824
[Docs] Fixed a minor issue in README.md by @Hacker24a9 in #1823
[Docs] Fixed typos and added minor enhancements @VaibhavWakde52 in #1813
[UX] Show run error on the run page in the UI by @olgenn in #1829
[UX] Normalize AMD Instinct accelerator names by @un-def in #1817
[Bugfix] Fixed the issue with attaching volume to existing instance by @un-def in #1836
[Bugfix] Fix GCP config public_ips not allowed via the API by @r4victor in #1840
[Internal] Don't cache GpuVendor in global var by @un-def in #1845
[Internal] Fix shim tests binding to 0 port by @r4victor in #1846

New contributors

@FarukhS52 made their first contribution in #1810
@amantyagiprojects made their first contribution in #1800
@priyanshuverma-dev made their first contribution in #1819
@sravan1946 made their first contribution in #1824
@Hacker24a9 made their first contribution in #1823
@VaibhavWakde52 made their first contribution in #1813
@fool1280 made their first contribution in #1818
@mtaran made their first contribution in #1821

Full changelog: 0.18.17...0.18.18

Contributors

mtaran, un-def, and 12 other contributors

Assets 2

09 Oct 10:20

r4victor

0.18.17

8a2b9d1

0.18.17

AMD

SSH fleets

dstack now supports SSH fleets with AMD GPUs. Hosts should be pre-installed with Docker and AMDGPU-DKMS kernel driver (e.g. via native package manager or AMDGPU installer).

AWS

Elastic Fabric Adapter (EFA)

dstack now automatically enables AWS EFA if it is supported by the instance type, no extra configuration needed. The following EFA-enabled instance types are supported: p5.48xlarge, p4d.24xlarge, g4dn.12xlarge, g4dn.16xlarge, g4dn.8xlarge, g4dn.metal, g5.12xlarge, g5.16xlarge, g5.24xlarge, g5.48xlarge, g5.8xlarge, g6.12xlarge, g6.16xlarge, g6.24xlarge, g6.48xlarge, g6.8xlarge, gr6.8xlarge.

CLI

Improved `dstack apply` plan

Previously, dstack apply showed a plan only for run configurations. Now it shows a plan for all configuration types including fleets, volumes, and gateways. Here's a fleet plan showing configuration parameters and the offers that will be tried for provisioning:

$ dstack apply -f .dstack/confs/fleet.yaml
 Project        main                           
 User           admin                          
 Configuration  .dstack/confs/fleet.yaml       
 Type           fleet                          
 Fleet type     cloud                          
 Nodes          2                              
 Placement      cluster                        
 Backends       aws                            
 Resources      2..xCPU, 8GB.., 100GB.. (disk) 
 Spot policy    on-demand                      

 #  BACKEND  REGION        INSTANCE   RESOURCES                   SPOT  PRICE    
 1  aws      eu-west-1     m5.large   2xCPU, 8GB, 100.0GB (disk)  no    $0.107   
 2  aws      eu-central-1  m5.large   2xCPU, 8GB, 100.0GB (disk)  no    $0.115   
 3  aws      eu-west-1     c5.xlarge  4xCPU, 8GB, 100.0GB (disk)  no    $0.192   
    ...                                                                          
 Shown 3 of 82 offers, $40.9447 max

Fleet my-cluster-fleet does not exist yet.
Create the fleet? [y/n]:

UI

Volumes

Server administrators and regular users can now see volumes in the UI.

What's changed

Dstack version on UI by @olgenn in #1742
Fix restarting gateway connections by @jvstme in #1746
Fix Handle KeyboardInterrupt in CLI when getting run plan #1626 by @IshuSinghSE in #1756
Add AMD support on on-prem fleets by @un-def in #1754
Implement fleet apply plan by @r4victor in #1765
chore: update provisioning.py by @eltociear in #1768
Fix use all available runpod regions by default by @IshuSinghSE in #1757
Implement apply plan for gateways and volumes by @r4victor in #1774
Fix connection to ssh instance on non-standard ssh port by @un-def in #1766
Fix docker SSH commands by @un-def in #1771
Add Llama3.2 Vision Model Example by @Bihan in #1770
Disable backend autoconfig via default creds by @r4victor in #1778
Set backends requests timeouts by @r4victor in #1793
Add UI for volumes #1683 by @olgenn in #1785
UI for volumes 1683 by @olgenn in #1795
[Docs] Add AMD GPU info to ssh fleets section by @un-def in #1779
[shim] Use DockerRootDir to detect free disk space by @un-def in #1802
Add AWS EFA support by @un-def in #1801

New contributors

@IshuSinghSE made their first contribution in #1756
@eltociear made their first contribution in #1768

Full changelog: 0.18.16...0.18.17

Contributors

un-def, olgenn, and 5 other contributors

Assets 2

30 Sep 10:29

r4victor

0.18.16

fccc8dd

0.18.16

New versioning policy

Starting with this release, dstack adopts a new versioning policy to provide better server and client backward compatibility and improve the upgrading experience. dstack continues to follow semver versioning scheme ({major}.{minor}.{patch}) with the following principles:

The server backward compatibility is maintained across all minor and patch releases. The specific features can be removed but the removal is preceded with deprecation warnings for several minor releases. This means you can use older client versions with newer server versions.
The client backward compatibility is maintained across patch releases. A new minor release indicates that the release breaks client backward compatibility. This means you don't need to update the server when you update the client to a new patch release. Still, upgrading a client to a new minor version requires upgrading the server too.

Perviously, dstack never guaranteed client backward compatibility, so you had to always update the server when updating the client. The new versioning policy makes the client and server upgrading more flexible.

Note: The new policy only takes affect after both the clients and the server are upgraded to 0.18.16. The 0.18.15 server still won't work with newer clients.

dstack attach

The CLI gets a new dstack attach command that allows attaching to a run. It establishes the SSH tunnel, forwards ports, and streams run logs in real time:

 ✗ dstack attach silent-panther-1
Attached to run silent-panther-1 (replica=0 job=0)
Forwarded ports (local -> remote):
  - localhost:7860 -> 7860
To connect to the run via SSH, use `ssh silent-panther-1`.
Press Ctrl+C to detach...

This command is a replacement for dstack logs --attach with major improvements and bugfixes.

CloudWatch-related bugfixes

The releases includes several important bugfixes for CloudWatchLogStorage. We strongly recommend upgrading the dstack server if it's configured to store logs in CloudWatch.

Deprecations

dstack logs --attach is deprecated in favor of dstack attach and may be removed in the following minor releases.

What's Changed

Check client-server compatibility according to new versioning policy by @r4victor in #1730
[runner] fix MonotonicTimestamp by @un-def in #1728
Gateway-in-server early prototype by @jvstme in #1718
Implement dstack attach command by @r4victor in #1733
Respect CloudWatch timestamp constraints by @un-def in #1732
Add AMD examples with vLLM, Axolotl and Trl by @Bihan in #1693
dstack-proxy naming tweaks by @jvstme in #1734
Fix Failed to attach via Python API by @r4victor in #1739
Support calling RunCollection.get_plan() without repo by @r4victor in #1741

Full Changelog: 0.18.15...0.18.16

Contributors

un-def, Bihan, and 2 other contributors

Assets 2

25 Sep 10:56

r4victor

0.18.15

c187166

0.18.15

Cluster placement groups

Instances of AWS cluster fleets are now provisioned into cluster placement groups for better connectivity. For example, when you create this fleet:

type: fleet
name: my-cluster-fleet
nodes: 4
placement: cluster
backends: [aws]

dstack will automatically create a cluster placement group and use it to provision the instances.

On-prem and VM-based fleets improvements

All available Nvidia driver capabilities are now requested by default, which makes it possible to run GPU workloads requiring OpenGL/Vulkan/RT/Video Codec SDK libraries. (#1714)
Automatic container cleanup. Previously, when the run completed, either successfully or due to an error, its container was not deleted, which led to ever-increasing storage consumption. Now, only the last stopped container is preserved and is available until the next run is completed. (#1706)

Major bug fixes

Fixed a bug where under some conditions logs wouldn't be uploaded to CloudWatch Logs due to size limits. (#1712)
Fixed a bug that prevented running services on on-prem instances. (#1716)

Changelog

Fix cli connection issue with TPU by @Bihan in #1705
Rename --default to --yes and no-default to --no in dstack config and dstack server by @peterschmidt85 in #1709
[CI] Fix shim/runner release versions by @un-def in #1704
Document run diagnostic logs by @r4victor in #1710
[shim] Add old container cleanup routine by @un-def in #1706
Write events to CloudWatch in batches by @un-def in #1712
[shim] Request all Nvidia driver capabilities by @un-def in #1714
Added showing dstack version on the UI by @olgenn in #1717
Add missing project SSH key to on-prem instances by @un-def in #1716
Simplify handling missing GatewayConfiguration by @jvstme in #1724
[shim] Fix container logs processing by @un-def in #1721
Support AWS placement groups for cluster fleets by @r4victor in #1725

Full Changelog: 0.18.14...0.18.15

Contributors

un-def, olgenn, and 4 other contributors

Assets 2

24 Sep 06:53

un-def

0.18.15rc1

1ce1243

0.18.15rc1 Pre-release

Pre-release

On-prem and VM-based fleets improvements

All available Nvidia driver capabilities are now requested by default, which makes it possible to run GPU workloads requiring OpenGL/Vulkan/RT/Video Codec SDK libraries.
Automatic container cleanup. Previously, when the run completed, either successfully or due to an error, its container was not deleted, which led to ever-increasing storage consumption. Now, only the last stopped container is preserved and is available until the next run is completed.

Major bug fixes

Fixed a bug where under some conditions logs wouldn't be uploaded to CloudWatch Logs due to size limits.

Changelog

[UX] Rename --default to --yes and --no-default to --no in dstack config and dstack server by @peterschmidt85 in #1709
Fix cli connection issue with TPU by @Bihan in #1705
Fix dstack-shim and dstack-runner release versions by @un-def in #1704
Request all Nvidia driver capabilities by @un-def in #1714
Add old container cleanup routine by @un-def in #1706
Write events to CloudWatch in batches by @un-def in #1712
[Docs] Document run diagnostic logs by @r4victor in #1710
[Docs] Added the server deployment guide, updated the README.md for the Docker Hub, fixed the scrolling issue by @peterschmidt85

Full changelog: 0.18.14...0.18.15rc1

Contributors

un-def, Bihan, and 2 other contributors

Assets 2

18 Sep 09:40

r4victor

0.18.14

854c812

0.18.14

Multi-replica server deployment

Previously, the dstack server only supported deploying a single instance (replica). However, with 0.18.14, you can now deploy multiple replicas, enabling high availability and zero-downtime updates

Note

Multi-replica server deployment requires using Postgres instead of the default SQLite. To configure Postgres, set the DSTACK_DATABASE_URL environment variable.

Make sure to update to version 0.18.14 before configuring multiple replicas.

Major bug-fixes

[Bugfix] dstack init --git-identity doesn't accept backslashes in path on Windows by @un-def in #1686
[Bugfix] Use -tmpfs /dev/shm:rw,nosuid,nodev,exec,size=X instead of --shm-size=X @un-def in #1690
[Bugfix] dstack-shim is not updated when fleet is recreated by @un-def in #1698

Other

[Bugfix] Fix SSHAttach.reuse_ports_lock() when no grep matches by @un-def in #1700
[Bugfix] Fix logger exception on instance provisioning timeout by @un-def in #1697
[Internal] Add JobProvisioningData.base_backend by @r4victor in #1682
[Internal] Add Run.error by @r4victor in #1684
[Internal] Return server_version in /api/server/get_info by @r4victor in #1685
[Internal] Allow gateway to connect to replicated server by @jvstme in #1688
[Internal] Adjust gateway management for multiple server replicas by @r4victor in #1691
[Internal] Skip gateway update if gateway was updated recently by @r4victor in #1695
[Internal] Remove redundant logger.error by @r4victor in #1702

Full changelog: 0.18.13...0.18.14

Contributors

un-def, r4victor, and jvstme

Assets 2

Releases: dstackai/dstack

0.18.22

Custom OS images on AWS

What's Changed

Contributors

0.18.21

Instance volumes

Azure custom and private networks

Python 3.13 support for dstack package

Multi-job UI

What's Changed

New Contributors

Contributors

0.18.20

Python 3.13 support

Custom backend tags

Improved support of AWS private subnets

New required permissions

Deprecations

What's Changed

Contributors

0.18.19

What's Changed

Contributors

0.18.18

Hardware metrics

Docker inside dstack

What's changed

New contributors

Contributors

0.18.17

AMD

SSH fleets

AWS

Elastic Fabric Adapter (EFA)

CLI

Improved dstack apply plan

UI

Volumes

What's changed

New contributors

Contributors

0.18.16

New versioning policy

dstack attach

CloudWatch-related bugfixes

Deprecations

What's Changed

Contributors

0.18.15

Cluster placement groups

On-prem and VM-based fleets improvements

Major bug fixes

Changelog

Contributors

0.18.15rc1

On-prem and VM-based fleets improvements

Major bug fixes

Changelog

Contributors

0.18.14

Multi-replica server deployment

Major bug-fixes

Other

Contributors

Python 3.13 support for `dstack` package

Improved `dstack apply` plan