Add pd disaggregated inference by Bihan · Pull Request #3558 · dstackai/dstack

Bihan · 2026-02-10T08:08:32Z

Testing Steps

Create (CPU node) in K8s cluster
Create gateway in the CPU node using below config

type: gateway
name: bihan-gateway

backend: kubernetes
region: any

domain: bihan-gateway.dstack.ai
router: sglang

Create GPU-node with 3 instances (1 Prefill, 1 Decode and 1 for testing scaling) in the same K8s cluster where gateway node exists.
Note: See design doc for details on why the gateway and workers are required to be on the same network.
Apply below prefill-decode service configuration

type: service
name: prefill-decode
image: lmsysorg/sglang:latest

env:
  - HF_TOKEN
  - MODEL_ID=meta-llama/Llama-3.2-1B-Instruct

replicas:
  - count: 1..2
    scaling:
      metric: rps
      target: 3
    commands:
      - |
          python -m sglang.launch_server \
            --model-path $MODEL_ID \
            --disaggregation-mode prefill \
            --disaggregation-transfer-backend mooncake \
            --host 0.0.0.0 \
            --port 8000 \
            --disaggregation-bootstrap-port 8998 \
            --log-level debug \
            > worker-server.log 2>&1
    resources:
      gpu: 1

  - count: 1
    commands:
      - |
          python -m sglang.launch_server \
            --model-path $MODEL_ID \
            --disaggregation-mode decode \
            --disaggregation-transfer-backend mooncake \
            --host 0.0.0.0 \
            --port 8000 \
            --log-level debug \
            > worker-server.log 2>&1
    resources:
      gpu: 1

port: 8000
model: meta-llama/Llama-3.2-1B-Instruct

probes:
  - type: http
    url: /health_generate
    interval: 15s

router:
  type: sglang
  policy: round_robin
  pd_disaggregation: true

When rps>=3 prefill replica scales to 2.

Note: For testing you need to assign wheel to https://bihan-test-bucket.s3.eu-west-1.amazonaws.com/dstack_gateway-0.0.1-py3-none-any.whl

Test2 Internal IP Test Add worker with internal_ip Check status and register Add Status Ready Log Add Prefill-Decode Add PD to dstack Test register worker without poll Add router config in service config Update remove worker Clean Up router code Clean Up Further Cleanup

src/dstack/_internal/core/models/configurations.py

src/dstack/_internal/core/models/routers.py

src/dstack/_internal/server/services/services/__init__.py

src/dstack/_internal/proxy/gateway/services/model_routers/base.py

src/dstack/_internal/proxy/gateway/services/model_routers/sglang.py

src/dstack/_internal/proxy/gateway/services/nginx.py

src/dstack/_internal/proxy/gateway/services/model_routers/sglang.py

src/dstack/_internal/proxy/gateway/services/nginx.py

src/dstack/_internal/proxy/gateway/services/registry.py

src/dstack/_internal/proxy/gateway/services/nginx.py

src/dstack/_internal/proxy/gateway/services/registry.py

src/dstack/_internal/core/models/gateways.py

src/dstack/_internal/server/services/services/__init__.py

src/dstack/_internal/core/models/routers.py

jvstme · 2026-02-17T22:32:53Z

src/dstack/_internal/core/models/configurations.py

+        Optional[AnyRouterConfig],
+        Field(
+            description=(
+                "Router configuration for the service. Requires a gateway with matching router enabled. "


(nit)

Suggested change

"Router configuration for the service. Requires a gateway with matching router enabled. "

"Router configuration for the service. Requires a gateway with matching router enabled"

Bihan Rana added 2 commits February 10, 2026 12:04

Add pd disaggregation service

a90173b

Bihan requested review from jvstme and peterschmidt85 February 10, 2026 08:45

Bihan mentioned this pull request Feb 10, 2026

Migrate service model base url #3560

Merged

jvstme reviewed Feb 11, 2026

View reviewed changes

jvstme reviewed Feb 12, 2026

View reviewed changes

Bihan Rana and others added 7 commits February 13, 2026 08:12

Move router configuration to service

5ec1f97

Resolve major comments

6e7dbe7

Merge branch 'master' into add_pd_disaggregated_inference

4fd74ba

Resolve Lint Error

860ea23

Minor Update

38eee94

Resolve Minor Comments

63ed75c

Update wheel url

8560bcb

jvstme reviewed Feb 16, 2026

View reviewed changes

Resolve backward incompatibility

56286c4

jvstme reviewed Feb 17, 2026

View reviewed changes

Bihan Rana and others added 5 commits February 18, 2026 06:18

Update RouterConfigs

b71080d

Merge branch 'master' into add_pd_disaggregated_inference

ebf7035

Resolve Lint Error

ac4b2a4

Update gateway wheel

b619657

Minor Update

f540943

jvstme approved these changes Feb 18, 2026

View reviewed changes

Bihan merged commit 3aae583 into dstackai:master Feb 18, 2026
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pd disaggregated inference#3558

Add pd disaggregated inference#3558
Bihan merged 15 commits intodstackai:masterfrom
Bihan:add_pd_disaggregated_inference

Bihan commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jvstme Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

	"Router configuration for the service. Requires a gateway with matching router enabled. "
	"Router configuration for the service. Requires a gateway with matching router enabled"

Conversation

Bihan commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jvstme Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Bihan commented Feb 10, 2026 •

edited

Loading