Skip to content

Releases: dstackai/dstack-enterprise

0.20.10-v1

19 Feb 12:42

Choose a tag to compare

Services

Prefill-Decode disaggregation

dstack now supports disaggregated Prefill–Decode inference, allowing both Prefill and Decode worker types to run within a single service.

To define and run such a service, set pd_disaggregation to true under the router property (this requires the gateway to use the sglang router, and define separate replica groups for Prefill and Decode worker types:

type: service
name: prefill-decode

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

image: lmsysorg/sglang:latest

replicas:
  - count: 1..4
    scaling:
      metric: rps
      target: 3
    commands:
      - |
          python -m sglang.launch_server \
            --model-path $MODEL_ID \
            --disaggregation-mode prefill \
            --disaggregation-transfer-backend mooncake \
            --host 0.0.0.0 \
            --port 8000 \
            --disaggregation-bootstrap-port 8998
    resources:
      gpu: H200

  - count: 1..8
    scaling:
      metric: rps
      target: 2
    commands:
      - |
          python -m sglang.launch_server \
            --model-path $MODEL_ID \
            --disaggregation-mode decode \
            --disaggregation-transfer-backend mooncake \
            --host 0.0.0.0 \
            --port 8000
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

probes:
  - type: http
    url: /health_generate
    interval: 15s

router:
  type: sglang
  pd_disaggregation: true

Note

Note, pd_disaggregation requires both the gateway and replicas to use the same cluster. With dstack, this can now be used with the aws, gcp, kubernetes backends (as they support creating both clusters and gateways). Support for more backends (and eventually SSH fleets) is coming soon.

Currently, pd_disaggregation works only with SGLang. Support for vLLM is coming soon.

Support for additional scaling metrics, such as TTFT and ITL, is also coming soon to enable autoscaling of Prefill and Decode workers.

Model endpoint

If you configure the model property, dstack previously provided a global model endpoint at gateway.<gateway domain> (or /proxy/models/<project name>), allowing access to all models deployed in the project. This endpoint has been deprecated.

Now, any deployed model should be accessed via the service endpoint itself at <run name>.<gateway domain> (or /proxy/services/main/<service name>).

Note

If you configure the model property, dstack automatically enables CORS on the service endpoint. Future versions will allow you to disable or customize this behavior.

CLI

dstack apply

Previously, if you did not specify gpu, dstack treated it as 0..1 but did not display it in the run plan. Now, dstack properly displays this default. Additionally, if you do not specify image, dstack automatically defaults the vendor to nvidia.

dstack apply -f dev.dstack.yml
 Project              peterschmidt85
 User                 peterschmidt85
 Type                 dev-environment
 Resources            cpu=2.. mem=8GB.. disk=100GB.. gpu=0..
 Spot policy          on-demand
 Max price            off
 Retry policy         off
 Idle duration        5m
 Max duration         off
 Inactivity duration  off

 #  BACKEND         RESOURCES                  INSTANCE TYPE  PRICE
 1  verda (FIN-01)  cpu=4 mem=16GB disk=100GB  CPU.4V.16G     $0.0279
 2  verda (FIN-02)  cpu=4 mem=16GB disk=100GB  CPU.4V.16G     $0.0279
 3  verda (FIN-03)  cpu=4 mem=16GB disk=100GB  CPU.4V.16G     $0.0279
    ...

Submit the run dev? [y/n]: 

This makes the run plan much more explicit and clear.

Full changelog: dstackai/dstack@0.20.8-v1...0.20.10-v1

0.20.8-v1

05 Feb 11:55

Choose a tag to compare

CLI

dstack event --watch

The dstack event command now supports a --watch option for real-time event tracking.

video

Event coverage has also been improved, with events for run in-place update and service registration now available.

dstack fleet

The dstack fleet command now includes fleet-level information such as nodes, resources, spot policy, and backend details, with individual instances listed underneath.

dstack-fleet

Skills

SKILL.md

If you're using agents such as Claude Code, Codex, Cursor, etc., it’s now possible to install dstack skills.

npx skills add dstackai/dstack

These skills make the agent fully aware of the configuration syntax and CLI commands.

Screenshot 2026-02-05 at 11 54 18

Services

Probes

UI

The UI now displays probe statuses for services, helping monitor replica readiness and health.

ui-probes

until_ready

A new until_ready option for probes allows stopping probe execution once the ready_after threshold is reached. This is useful for resource-intensive probes that only need to run during startup:

probes:
  - type: http
     url: /health
     until_ready: true
     ready_after: 2

Model probes

Services that use the model property to declare a chat model with an OpenAI-compatible interface now receive an automatically configured probe that checks model availability by requesting /v1/chat/completions.

Backends

RunPod

Community Cloud

RunPod Community Cloud is now disabled by default to ensure a more reliable experience. You can still enable Community Cloud in the backend settings. dstack Sky users can enable Community Cloud only when using their own RunPod credentials.

CUDO

Due to CUDO Compute winding down its public on-demand offering, the cudo backend is now deprecated.

What's changed

Full changelog: dstackai/dstack@0.20.7...0.20.8

0.20.7-v1

28 Jan 16:59

Choose a tag to compare

Services

Replica groups

A service can now include multiple replica groups. Each group can define its own commands, resources spec, and scaling rules.

type: service
name: llama-8b-service

image: lmsysorg/sglang:latest
env:
  - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B

replicas:
  - count: 1..2
    scaling:
      metric: rps
      target: 10
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --port 8000 \
          --trust-remote-code
    resources:
      gpu: 48GB

  - count: 1..4
    scaling:
      metric: rps
      target: 5
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --port 8000 \
          --trust-remote-code
    resources:
      gpu: 24GB

port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Note

Properties such as regions, port, image, env and some other cannot be configured per replica group. This support is coming soon.

Note

Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.

Events

Events are now also supported for volumes, gateways, and secrets.

$ dstack event --target-gateway my-gateway
[2026-01-28 11:53:03] [👤admin] [gateway my-gateway] Gateway created. Status: SUBMITTED
[2026-01-28 11:53:32] [gateway my-gateway] Gateway status changed SUBMITTED -> PROVISIONING
[2026-01-28 11:54:46] [gateway my-gateway] Gateway status changed PROVISIONING -> RUNNING
[2026-01-28 11:55:08] [👤admin] [gateway my-gateway] Gateway set as default

Instance events now also include reachability and health events.

Finally, we have added Events under Concepts in the documentation.

CLI

dstack project

The dstack project and dstack project set-default commands now allow you to interactively select the default project when these commands are run without arguments.

dstack-cli-project

dstack login

The dstack login command can now be run without arguments. In this case, it will interactively ask for the URL and provider if needed. If you want to use dstack Sky, you can simply press Enter without entering a URL or provider.

dstack-cli-login

Also, if you have multiple projects, the command will prompt you to select the default project as well.

What's changed

Full changelog: dstackai/dstack@0.20.6...0.20.7

0.20.6-v1

21 Jan 13:31

Choose a tag to compare

Server deployment

Memory optimization

This release reduces peak server memory usage. Previously, memory grew with the total number of instances ever submitted; this is now fixed. We recommend upgrading if memory usage increases over time.

Logs storage

Fluent Bit + Elasticsearch/OpenSearch

Run logs can now be stored in your own log storage via Fluent Bit. At the same time, dstack can now read run logs from Elasticsearch/OpenSearch (to display in the UI and CLI), if Fluent Bit ships the logs there.

See the docs for more details.

Fleets

Since 0.20, dstack requires at least one fleet to be created before you can submit any runs. To make this easier, we’ve simplified default fleet creation during project setup in the UI:

In addition, if your project doesn’t have a fleet, the UI will prompt you to create one.

What's changed:

Full changelog: dstackai/dstack@0.20.3...0.20.6

0.20.5-v1

21 Jan 11:20

Choose a tag to compare

This is a hotfix release that fixes a bug in 0.20.4 with some UI pages not working (Users, Projects, Settings).

What's Changed

Full Changelog: dstackai/dstack@0.20.4...0.20.5

0.20.4-v1

21 Jan 10:39

Choose a tag to compare

What's changed

Full Changelog: dstackai/dstack@0.20.3...0.20.4

0.20.3-v1

08 Jan 18:08

Choose a tag to compare

Dev environments

Windsurf IDE

Dev environments now support Windsurf as a first-class IDE option alongside VSCode and Cursor.

type: dev-environment
ide: windsurf

repos:
- https://github.com/dstackai/dstack

resources:
  gpu: 24GB..:1

dstack provisions an instance for your dev environment and seamlessly connects your local Windsurf editor to it.

dstack-windsurf-dev-environment-min

Troubleshooting

Runs/fleets/volumes/gateways JSON via CLI

You can now inspect the full JSON state of runs, fleets, volumes, and gateways using these CLI commands:

$ dstack run get <name> --json
$ dstack fleet get <name> --json
$ dstack volume get <name> --json
$ dstack gateway get <name> --json

Runs/fleets JSON via UI

The UI includes new "Inspect" tabs with read-only JSON viewers for runs and fleets, making it easier to debug and understand resource states.

dstack-inspect-ui-min

What's changed

Full Changelog: dstackai/dstack@0.20.1...0.20.3

0.20.1-v1

25 Dec 11:44

Choose a tag to compare

CLI

No-fleets warning

Since the last major release, fleets are required before submitting runs. This update makes that requirement explicit in the CLI.

Screenshot 2025-12-25 at 15 39 00

When a run is submitted for a project that has no fleets, the CLI now shows a dedicated warning. The run status has also been updated in both the CLI and UI to No fleets instead of No offers.

This removes ambiguity around failed runs that previously appeared as No offers.

dstack login

You can now authenticate the CLI using a new command, dstack login, instead of manually providing a token.

Screenshot 2025-12-25 at 15 42 30

dstack Enterprise supports SSO with providers such as Okta, Microsoft Entra ID, and Google.

Services

Service configurations now support gateway: true.

For services that require gateway features (such as auto-scaling, custom domains, WebSockets, etc), this property makes the requirement explicit. When set, dstack ensures a default gateway is present.

dstack-shim

In addition to the dstack-runner auto-update mechanism introduced in 0.20.0, dstack-shim now also supports auto-updating.

See contributing/RUNNER-AND-SHIM.md for details.

What's Changed

Full Changelog: dstackai/dstack@0.20.0...0.20.1

0.20.0-v1

17 Dec 11:34

Choose a tag to compare

dstack 0.20 is a major release that brings significant improvements and introduces a number of breaking changes. Read below for the most important ones. For migration notes, please refer to the migration guide.

Fleets

dstack previously had two different ways to provision instances for runs: using a fleet configuration or using automatic fleet provisioning on run apply. To unify the UX, dstack no longer creates fleets automatically.

Fleets must now be created explicitly before submitting runs. This gives users full control over the provisioning lifecycle. If you don't need any limits on instance provisioning (as was the case with auto-created fleets), you can create a single elastic fleet for all runs:

type: fleet
name: default-fleet
nodes: 0..

Note that multi-node tasks require fleets with placement: cluster, which provides the best possible connectivity. You will need a separate fleet for each cluster.

Note

To keep the old behavior with auto-created fleets, set the DSTACK_FF_AUTOCREATED_FLEETS_ENABLED environment variable.

Runs

Working directory

Previously, the working_dir property had complicated semantics: it defaulted to /workflow, but for tasks and services without commands, the image's working directory was used instead.

This has now been simplified: working_dir always defaults to the image's working directory. The working directory of the default dstack images is now set to /dstack/run.

Repo directory

Working with repos is now more explicit and intuitive. First, dstack now only sets up repos that are explicitly defined in run configurations via repos; repos initialized with dstack init are not set up unless specified:

type: dev-environment
ide: vscode
repos:
  # Clone the repo in the configuration's dir into `working_dir`
  - .

Second, repos[].path now defaults to working_dir (".") instead of /workflow.

Third, cloning a repo into a non-empty directory now raises an error so that mistakes are not silently ignored. The previous behavior of skipping cloning can be specified explicitly with if_exists: skip:

type: dev-environment
ide: vscode
repos:
  - local_path: .
    path: /my_volume/repo
    if_exists: skip

Events

dstack now stores important events—such as resource CRUD operations, status changes, and other information crucial for auditing and debugging. Users can view events using the dstack event CLI command or in the UI.

$ dstack event
[2025-12-11 15:05:20] [👤admin] [run clever-cheetah-1] Run submitted. Status: SUBMITTED
[2025-12-11 15:05:20] [job clever-cheetah-1-0-0] Job created on run submission. Status: SUBMITTED
[2025-12-11 15:05:26] [job clever-cheetah-1-0-0, instance cloud-fleet-0] Job assigned to instance. Instance status: BUSY (1/1 blocks busy)

CLI

JSON output

The dstack ps and dstack gateway commands now support --format json / --json arguments that print results in JSON instead of plaintext:

$ dstack ps --json
{
  "project": "main",
  "runs": [
    {
      "id": "5f2e08b5-2098-4064-86c7-0efe0eb84970",
      "project_name": "main",
      "user": "admin",
      "fleet": {
        "id": "9598d5db-67d8-4a2e-bdd2-842ab93b2f2e",
        "name": "cloud-fleet"
      },
      ...
    }
  ]
}

Verda (formerly Datacrunch)

The datacrunch backend has been renamed to verda, following the company's rebranding.

projects:
  - name: main
    backends:
      - type: verda
        creds:
          type: api_key
          client_id: xfaHBqYEsArqhKWX-e52x3HH7w8T
          client_secret: B5ZU5Qx9Nt8oGMlmMhNI3iglK8bjMhagTbylZy4WzncZe39995f7Vxh8

Gateways

Gateway configurations now support an optional instance_type property that allows overriding the default gateway instance type:

type: gateway
name: example-gateway

backend: aws
region: eu-west-1

instance_type: t3.large

domain: example.com

Currently instance_type is supported for aws and gcp backends.

All breaking changes

  • Fleets are no longer created automatically on run apply and have to be created explicitly before submitting runs.
  • The run's working_dir now always defaults to the image's working directory instead of /workflow. The working directory of dstack default images is now /dstack/run.
  • repos[].path now defaults to working_dir (".") instead of /workflow.
  • Dropped implicitly loaded repos; repos must be specified via repos configuration property.
  • Cloning a repo into a non-empty directory now raises an error. This can be changed by setting if_exists: skip.
  • Dropped CLI commands dstack config, dstack stats, and dstack gateway create.
  • Dropped Python API RunCollection methods RunCollection.get_plan(), RunCollection.exec_plan(), and RunCollection.submit().
  • Dropped local repos support: dstack init --local and dstack.api.LocalRepo.
  • Dropped Azure deprecated VM series Dsv3 and Esv4.
  • Dropped legacy server environment variables DSTACK_SERVER_METRICS_TTL_SECONDS and DSTACK_FORCE_BRIDGE_NETWORK.

Deprecations

  • Deprecated the API endpoint /api/project/{project_name}/fleets/create in favor of /api/project/{project_name}/fleets/apply.
  • Deprecated repo_dir argument in RunCollection.get_run_plan() in favor of repos[].path.

What's Changed

Read more

0.19.38-v1

21 Nov 08:40

Choose a tag to compare

Run plan

Since 0.19.26 release, dstack provisions instances with respect to configured fleets, but run plans offers didn't reflect that, so you not might see the actual offers used for provisioning. This is now fixed and run plan shows offers with respect to configured fleets. For example, you can create a fleet for provisioning spot GPU instances on AWS:

type: fleet
name: cloud-fleet
nodes: 0..
backends: [aws]
spot_policy: spot
resources: 
  gpu: 1..

and the runs respect that configuration:

✗ dstack apply                                                      
...
 #  BACKEND          RESOURCES                            INSTANCE TYPE  PRICE    
 1  aws (us-east-1)  cpu=4 mem=16GB disk=100GB T4:16GB:1  g4dn.xlarge    $0.526   
 2  aws (us-east-2)  cpu=4 mem=16GB disk=100GB T4:16GB:1  g4dn.xlarge    $0.526   
 3  aws (us-west-2)  cpu=4 mem=16GB disk=100GB T4:16GB:1  g4dn.xlarge    $0.526   
    ...                                                                           
 Shown 3 of 309 offers, $71.552max

Gateways

dstack gateways now support SGLang Router, enabling inference request routing with policies such as cache_aware, power_of_two, round_robin, and random. Currently, the gateway supports sglang-router version 0.2.1. You can enable the SGLang router in your gateway configuration and select any of the available routing policies. Example configuration:

type: gateway
name: sglang-gateway

backend: aws
region: eu-west-1

domain: example.com
router:
  type: sglang
  policy: cache_aware

What's Changed

Full Changelog: dstackai/dstack@0.19.37...0.19.38