You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>If it has a <code>model</code> mapping, the model will be accessible
4137
-
at <code>https://gateway.<project name>.sky.dstack.ai</code> via the OpenAI compatible interface.</p>
4136
+
<p>The service endpoint will be accessible at <code>https://<run name>.<project name>.sky.dstack.ai</code> via the OpenAI compatible interface.</p>
If the service defines the [`model`](#model) property, the model can be accessed with
99
-
the global OpenAI-compatible endpoint at `<dstack server URL>/proxy/models/<project name>/`,
100
-
or via `dstack` UI.
98
+
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with `Bearer <dstack token>`.
101
99
102
-
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with
103
-
`Bearer <dstack token>`.
100
+
## Configuration options
104
101
105
-
??? info "Gateway"
106
-
Running services for development purposes doesn’t require setting up a [gateway](gateways.md).
102
+
<!-- !!! info "No commands"
103
+
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set). -->
107
104
108
-
However, you'll need a gateway in the following cases:
105
+
### Gateway
109
106
110
-
* To use auto-scaling or rate limits
111
-
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
112
-
* To enable HTTPS for the endpoint and map it to your domain
113
-
* If your service requires WebSockets
114
-
* If your service cannot work with a [path prefix](#path-prefix)
107
+
Here are cases where a service may need a [gateway](gateways.md):
115
108
116
-
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
117
-
a gateway is already pre-configured for you. -->
109
+
* To use [auto-scaling](#replicas-and-scaling) or [rate limits](#rate-limits)
110
+
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
111
+
* To enable HTTPS for the endpoint and map it to your domain
112
+
* If your service requires WebSockets
113
+
* If your service cannot work with a [path prefix](#path-prefix)
118
114
119
-
If a [gateway](gateways.md) is configured, the service endpoint will be accessible at
120
-
`https://<run name>.<gateway domain>/`.
115
+
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
116
+
a gateway is already pre-configured for you. -->
121
117
122
-
If the service defines the `model` property, the model will be available via the global OpenAI-compatible endpoint
123
-
at `https://gateway.<gateway domain>/`.
118
+
If you want `dstack` to explicitly validate that a gateway is used, you can set the [`gateway`](../reference/dstack.yml/service.md#gateway) property in the service configuration to `true`. In this case, `dstack` will raise an error during `dstack apply` if a default gateway is not created.
124
119
125
-
## Configuration options
120
+
You can also set the `gateway` property to the name of a specific gateway, if required.
121
+
122
+
If you have a [gateway](gateways.md) created, the service endpoint will be accessible at `https://<run name>.<gateway domain>/`:
"content": "Compose a poem that explains the concept of recursion in programming."
136
+
}
137
+
]
138
+
}'
139
+
```
126
140
127
-
!!! info "No commands"
128
-
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set).
141
+
</div>
129
142
130
143
### Replicas and scaling
131
144
@@ -220,12 +233,6 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
220
233
??? info "Disaggregated serving"
221
234
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
222
235
223
-
### Model
224
-
225
-
If the service is running a chat model with an OpenAI-compatible interface,
226
-
set the [`model`](#model) property to make the model accessible via `dstack`'s
227
-
global OpenAI-compatible endpoint, and also accessible via `dstack`'s UI.
228
-
229
236
### Authorization
230
237
231
238
By default, the service enables authorization, meaning the service endpoint requires a `dstack` user token.
@@ -364,7 +371,7 @@ set [`strip_prefix`](../reference/dstack.yml/service.md#strip_prefix) to `false`
364
371
If your app cannot be configured to work with a path prefix, you can host it
365
372
on a dedicated domain name by setting up a [gateway](gateways.md).
366
373
367
-
### Rate limits { #rate-limits }
374
+
### Rate limits
368
375
369
376
If you have a [gateway](gateways.md), you can configure rate limits for your service
370
377
using the [`rate_limits`](../reference/dstack.yml/service.md#rate_limits) property.
@@ -413,6 +420,11 @@ Limits apply to the whole service (all replicas) and per client (by IP). Clients
413
420
414
421
</div>
415
422
423
+
### Model
424
+
425
+
If the service runs a model with an OpenAI-compatible interface, you can set the [`model`](#model) property to make the model accessible through `dstack`'s chat UI on the `Models` page.
426
+
In this case, `dstack` will use the service's `/v1/chat/completions` service.
427
+
416
428
### Resources
417
429
418
430
If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a
@@ -4691,7 +4717,7 @@ <h2 id="apply-a-configuration">Apply a configuration<a class="headerlink" href="
4691
4717
</div>
4692
4718
4693
4719
<p><code>dstack apply</code> automatically provisions instances and runs the service.</p>
4694
-
<p>If a <ahref="../gateways/">gateway</a>is not configured, the service’s endpoint will be accessible at
4720
+
<p>If you do not have a <ahref="../gateways/">gateway</a>created, the service endpoint will be accessible at
4695
4721
<code><dstack server URL>/proxy/services/<project name>/<run name>/</code>.</p>
4696
4722
<divclass="termy">
4697
4723
@@ -4711,34 +4737,44 @@ <h2 id="apply-a-configuration">Apply a configuration<a class="headerlink" href="
4711
4737
4712
4738
</div>
4713
4739
4714
-
<p>If the service defines the <ahref="#model"><code>model</code></a> property, the model can be accessed with
4715
-
the global OpenAI-compatible endpoint at <code><dstack server URL>/proxy/models/<project name>/</code>,
4716
-
or via <code>dstack</code> UI.</p>
4717
-
<p>If <ahref="#authorization">authorization</a> is not disabled, the service endpoint requires the <code>Authorization</code> header with
4718
-
<code>Bearer <dstack token></code>.</p>
4719
-
<detailsclass="info">
4720
-
<summary>Gateway</summary>
4721
-
<p>Running services for development purposes doesn’t require setting up a <ahref="../gateways/">gateway</a>.</p>
4722
-
<p>However, you'll need a gateway in the following cases:</p>
4740
+
<p>If <ahref="#authorization">authorization</a> is not disabled, the service endpoint requires the <code>Authorization</code> header with <code>Bearer <dstack token></code>.</p>
<p>If <code>commands</code> are not specified, <code>dstack</code> runs <code>image</code>’s entrypoint (or fails if none is set).</p>
4756
+
4757
+
<p>If you want <code>dstack</code> to explicitly validate that a gateway is used, you can set the <ahref="../../reference/dstack.yml/service/#gateway"><code>gateway</code></a> property in the service configuration to <code>true</code>. In this case, <code>dstack</code> will raise an error during <code>dstack apply</code> if a default gateway is not created.</p>
4758
+
<p>You can also set the <code>gateway</code> property to the name of a specific gateway, if required.</p>
4759
+
<p>If you have a <ahref="../gateways/">gateway</a> created, the service endpoint will be accessible at <code>https://<run name>.<gateway domain>/</code>:</p>
<p>If the service runs a model with an OpenAI-compatible interface, you can set the <ahref="#model"><code>model</code></a> property to make the model accessible through <code>dstack</code>'s chat UI on the <code>Models</code> page.
5029
+
In this case, <code>dstack</code> will use the service's <code>/v1/chat/completions</code> service.</p>
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
110
-
is available at `https://gateway.<gateway domain>/`.
108
+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill-deepseek.<gateway domain>/`.
0 commit comments