Skip to content

Commit f03a33c

Browse files
Deploying to gh-pages from @ dstackai/dstack@14ef341 🚀
1 parent b738431 commit f03a33c

File tree

22 files changed

+209
-174
lines changed

22 files changed

+209
-174
lines changed

blog/dstack-sky/index.html

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4133,13 +4133,12 @@ <h2 id="what-is-dstack-sky">What is dstack Sky?<a class="headerlink" href="#what
41334133
</code></pre></div>
41344134
</div>
41354135

4136-
<p>If it has a <code>model</code> mapping, the model will be accessible
4137-
at <code>https://gateway.&lt;project name&gt;.sky.dstack.ai</code> via the OpenAI compatible interface.</p>
4136+
<p>The service endpoint will be accessible at <code>https://&lt;run name&gt;.&lt;project name&gt;.sky.dstack.ai</code> via the OpenAI compatible interface.</p>
41384137
<div class="highlight"><pre><span></span><code><span class="kn">from</span><span class="w"> </span><span class="nn">openai</span><span class="w"> </span><span class="kn">import</span> <span class="n">OpenAI</span>
41394138

41404139

41414140
<span class="n">client</span> <span class="o">=</span> <span class="n">OpenAI</span><span class="p">(</span>
4142-
<span class="n">base_url</span><span class="o">=</span><span class="s2">&quot;https://gateway.&lt;project name&gt;.sky.dstack.ai&quot;</span><span class="p">,</span>
4141+
<span class="n">base_url</span><span class="o">=</span><span class="s2">&quot;https://&lt;run name&gt;.&lt;project name&gt;.sky.dstack.ai/v1&quot;</span><span class="p">,</span>
41434142
<span class="n">api_key</span><span class="o">=</span><span class="s2">&quot;&lt;dstack token&gt;&quot;</span>
41444143
<span class="p">)</span>
41454144

blog/posts/dstack-sky.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -121,15 +121,14 @@ model: mixtral
121121
```
122122
</div>
123123
124-
If it has a `model` mapping, the model will be accessible
125-
at `https://gateway.<project name>.sky.dstack.ai` via the OpenAI compatible interface.
124+
The service endpoint will be accessible at `https://<run name>.<project name>.sky.dstack.ai` via the OpenAI compatible interface.
126125

127126
```python
128127
from openai import OpenAI
129128
130129
131130
client = OpenAI(
132-
base_url="https://gateway.<project name>.sky.dstack.ai",
131+
base_url="https://<run name>.<project name>.sky.dstack.ai/v1",
133132
api_key="<dstack token>"
134133
)
135134

docs/concepts/services.md

Lines changed: 42 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Model meta-llama/Meta-Llama-3.1-8B-Instruct is published at:
7373

7474
`dstack apply` automatically provisions instances and runs the service.
7575

76-
If a [gateway](gateways.md) is not configured, the service’s endpoint will be accessible at
76+
If you do not have a [gateway](gateways.md) created, the service endpoint will be accessible at
7777
`<dstack server URL>/proxy/services/<project name>/<run name>/`.
7878

7979
<div class="termy">
@@ -95,37 +95,50 @@ $ curl http://localhost:3000/proxy/services/main/llama31/v1/chat/completions \
9595

9696
</div>
9797

98-
If the service defines the [`model`](#model) property, the model can be accessed with
99-
the global OpenAI-compatible endpoint at `<dstack server URL>/proxy/models/<project name>/`,
100-
or via `dstack` UI.
98+
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with `Bearer <dstack token>`.
10199

102-
If [authorization](#authorization) is not disabled, the service endpoint requires the `Authorization` header with
103-
`Bearer <dstack token>`.
100+
## Configuration options
104101

105-
??? info "Gateway"
106-
Running services for development purposes doesn’t require setting up a [gateway](gateways.md).
102+
<!-- !!! info "No commands"
103+
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set). -->
107104

108-
However, you'll need a gateway in the following cases:
105+
### Gateway
109106

110-
* To use auto-scaling or rate limits
111-
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
112-
* To enable HTTPS for the endpoint and map it to your domain
113-
* If your service requires WebSockets
114-
* If your service cannot work with a [path prefix](#path-prefix)
107+
Here are cases where a service may need a [gateway](gateways.md):
115108

116-
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
117-
a gateway is already pre-configured for you. -->
109+
* To use [auto-scaling](#replicas-and-scaling) or [rate limits](#rate-limits)
110+
* To enable a support custom router, e.g. such as the [SGLang Model Gateway](https://docs.sglang.ai/advanced_features/router.html#)
111+
* To enable HTTPS for the endpoint and map it to your domain
112+
* If your service requires WebSockets
113+
* If your service cannot work with a [path prefix](#path-prefix)
118114

119-
If a [gateway](gateways.md) is configured, the service endpoint will be accessible at
120-
`https://<run name>.<gateway domain>/`.
115+
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
116+
a gateway is already pre-configured for you. -->
121117

122-
If the service defines the `model` property, the model will be available via the global OpenAI-compatible endpoint
123-
at `https://gateway.<gateway domain>/`.
118+
If you want `dstack` to explicitly validate that a gateway is used, you can set the [`gateway`](../reference/dstack.yml/service.md#gateway) property in the service configuration to `true`. In this case, `dstack` will raise an error during `dstack apply` if a default gateway is not created.
124119

125-
## Configuration options
120+
You can also set the `gateway` property to the name of a specific gateway, if required.
121+
122+
If you have a [gateway](gateways.md) created, the service endpoint will be accessible at `https://<run name>.<gateway domain>/`:
123+
124+
<div class="termy">
125+
126+
```shell
127+
$ curl https://llama31.example.com/v1/chat/completions \
128+
-H 'Content-Type: application/json' \
129+
-H 'Authorization: Bearer &lt;dstack token&gt;' \
130+
-d '{
131+
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
132+
"messages": [
133+
{
134+
"role": "user",
135+
"content": "Compose a poem that explains the concept of recursion in programming."
136+
}
137+
]
138+
}'
139+
```
126140

127-
!!! info "No commands"
128-
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set).
141+
</div>
129142

130143
### Replicas and scaling
131144

@@ -220,12 +233,6 @@ Setting the minimum number of replicas to `0` allows the service to scale down t
220233
??? info "Disaggregated serving"
221234
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
222235

223-
### Model
224-
225-
If the service is running a chat model with an OpenAI-compatible interface,
226-
set the [`model`](#model) property to make the model accessible via `dstack`'s
227-
global OpenAI-compatible endpoint, and also accessible via `dstack`'s UI.
228-
229236
### Authorization
230237

231238
By default, the service enables authorization, meaning the service endpoint requires a `dstack` user token.
@@ -364,7 +371,7 @@ set [`strip_prefix`](../reference/dstack.yml/service.md#strip_prefix) to `false`
364371
If your app cannot be configured to work with a path prefix, you can host it
365372
on a dedicated domain name by setting up a [gateway](gateways.md).
366373

367-
### Rate limits { #rate-limits }
374+
### Rate limits
368375

369376
If you have a [gateway](gateways.md), you can configure rate limits for your service
370377
using the [`rate_limits`](../reference/dstack.yml/service.md#rate_limits) property.
@@ -413,6 +420,11 @@ Limits apply to the whole service (all replicas) and per client (by IP). Clients
413420
414421
</div>
415422
423+
### Model
424+
425+
If the service runs a model with an OpenAI-compatible interface, you can set the [`model`](#model) property to make the model accessible through `dstack`'s chat UI on the `Models` page.
426+
In this case, `dstack` will use the service's `/v1/chat/completions` service.
427+
416428
### Resources
417429

418430
If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a

docs/concepts/services/index.html

Lines changed: 68 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -934,11 +934,11 @@
934934
<ul class="md-nav__list">
935935

936936
<li class="md-nav__item">
937-
<a href="#replicas-and-scaling" class="md-nav__link">
937+
<a href="#gateway" class="md-nav__link">
938938
<span class="md-ellipsis">
939939

940940
<span class="md-typeset">
941-
Replicas and scaling
941+
Gateway
942942
</span>
943943

944944
</span>
@@ -947,11 +947,11 @@
947947
</li>
948948

949949
<li class="md-nav__item">
950-
<a href="#model" class="md-nav__link">
950+
<a href="#replicas-and-scaling" class="md-nav__link">
951951
<span class="md-ellipsis">
952952

953953
<span class="md-typeset">
954-
Model
954+
Replicas and scaling
955955
</span>
956956

957957
</span>
@@ -1009,6 +1009,19 @@
10091009
</span>
10101010
</a>
10111011

1012+
</li>
1013+
1014+
<li class="md-nav__item">
1015+
<a href="#model" class="md-nav__link">
1016+
<span class="md-ellipsis">
1017+
1018+
<span class="md-typeset">
1019+
Model
1020+
</span>
1021+
1022+
</span>
1023+
</a>
1024+
10121025
</li>
10131026

10141027
<li class="md-nav__item">
@@ -4282,11 +4295,11 @@
42824295
<ul class="md-nav__list">
42834296

42844297
<li class="md-nav__item">
4285-
<a href="#replicas-and-scaling" class="md-nav__link">
4298+
<a href="#gateway" class="md-nav__link">
42864299
<span class="md-ellipsis">
42874300

42884301
<span class="md-typeset">
4289-
Replicas and scaling
4302+
Gateway
42904303
</span>
42914304

42924305
</span>
@@ -4295,11 +4308,11 @@
42954308
</li>
42964309

42974310
<li class="md-nav__item">
4298-
<a href="#model" class="md-nav__link">
4311+
<a href="#replicas-and-scaling" class="md-nav__link">
42994312
<span class="md-ellipsis">
43004313

43014314
<span class="md-typeset">
4302-
Model
4315+
Replicas and scaling
43034316
</span>
43044317

43054318
</span>
@@ -4357,6 +4370,19 @@
43574370
</span>
43584371
</a>
43594372

4373+
</li>
4374+
4375+
<li class="md-nav__item">
4376+
<a href="#model" class="md-nav__link">
4377+
<span class="md-ellipsis">
4378+
4379+
<span class="md-typeset">
4380+
Model
4381+
</span>
4382+
4383+
</span>
4384+
</a>
4385+
43604386
</li>
43614387

43624388
<li class="md-nav__item">
@@ -4691,7 +4717,7 @@ <h2 id="apply-a-configuration">Apply a configuration<a class="headerlink" href="
46914717
</div>
46924718

46934719
<p><code>dstack apply</code> automatically provisions instances and runs the service.</p>
4694-
<p>If a <a href="../gateways/">gateway</a> is not configured, the service’s endpoint will be accessible at
4720+
<p>If you do not have a <a href="../gateways/">gateway</a> created, the service endpoint will be accessible at
46954721
<code>&lt;dstack server URL&gt;/proxy/services/&lt;project name&gt;/&lt;run name&gt;/</code>.</p>
46964722
<div class="termy">
46974723

@@ -4711,34 +4737,44 @@ <h2 id="apply-a-configuration">Apply a configuration<a class="headerlink" href="
47114737

47124738
</div>
47134739

4714-
<p>If the service defines the <a href="#model"><code>model</code></a> property, the model can be accessed with
4715-
the global OpenAI-compatible endpoint at <code>&lt;dstack server URL&gt;/proxy/models/&lt;project name&gt;/</code>,
4716-
or via <code>dstack</code> UI.</p>
4717-
<p>If <a href="#authorization">authorization</a> is not disabled, the service endpoint requires the <code>Authorization</code> header with
4718-
<code>Bearer &lt;dstack token&gt;</code>.</p>
4719-
<details class="info">
4720-
<summary>Gateway</summary>
4721-
<p>Running services for development purposes doesn’t require setting up a <a href="../gateways/">gateway</a>.</p>
4722-
<p>However, you'll need a gateway in the following cases:</p>
4740+
<p>If <a href="#authorization">authorization</a> is not disabled, the service endpoint requires the <code>Authorization</code> header with <code>Bearer &lt;dstack token&gt;</code>.</p>
4741+
<h2 id="configuration-options">Configuration options<a class="headerlink" href="#configuration-options" title="Permanent link">&para;</a></h2>
4742+
<!-- !!! info "No commands"
4743+
If `commands` are not specified, `dstack` runs `image`’s entrypoint (or fails if none is set). -->
4744+
4745+
<h3 id="gateway">Gateway<a class="headerlink" href="#gateway" title="Permanent link">&para;</a></h3>
4746+
<p>Here are cases where a service may need a <a href="../gateways/">gateway</a>:</p>
47234747
<ul>
4724-
<li>To use auto-scaling or rate limits</li>
4748+
<li>To use <a href="#replicas-and-scaling">auto-scaling</a> or <a href="#rate-limits">rate limits</a></li>
47254749
<li>To enable a support custom router, e.g. such as the <a href="https://docs.sglang.ai/advanced_features/router.html#">SGLang Model Gateway</a></li>
47264750
<li>To enable HTTPS for the endpoint and map it to your domain</li>
47274751
<li>If your service requires WebSockets</li>
47284752
<li>If your service cannot work with a <a href="#path-prefix">path prefix</a></li>
47294753
</ul>
4730-
<!-- Note, if you're using <a href="https://sky.dstack.ai">dstack Sky</a>,
4754+
<!-- Note, if you're using [dstack Sky](https://sky.dstack.ai),
47314755
a gateway is already pre-configured for you. -->
4732-
<p>If a <a href="../gateways/">gateway</a> is configured, the service endpoint will be accessible at
4733-
<code>https://&lt;run name&gt;.&lt;gateway domain&gt;/</code>.</p>
4734-
<p>If the service defines the <code>model</code> property, the model will be available via the global OpenAI-compatible endpoint
4735-
at <code>https://gateway.&lt;gateway domain&gt;/</code>.</p>
4736-
</details>
4737-
<h2 id="configuration-options">Configuration options<a class="headerlink" href="#configuration-options" title="Permanent link">&para;</a></h2>
4738-
<div class="admonition info">
4739-
<p class="admonition-title">No commands</p>
4740-
<p>If <code>commands</code> are not specified, <code>dstack</code> runs <code>image</code>’s entrypoint (or fails if none is set).</p>
4756+
4757+
<p>If you want <code>dstack</code> to explicitly validate that a gateway is used, you can set the <a href="../../reference/dstack.yml/service/#gateway"><code>gateway</code></a> property in the service configuration to <code>true</code>. In this case, <code>dstack</code> will raise an error during <code>dstack apply</code> if a default gateway is not created.</p>
4758+
<p>You can also set the <code>gateway</code> property to the name of a specific gateway, if required.</p>
4759+
<p>If you have a <a href="../gateways/">gateway</a> created, the service endpoint will be accessible at <code>https://&lt;run name&gt;.&lt;gateway domain&gt;/</code>:</p>
4760+
<div class="termy">
4761+
4762+
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>curl<span class="w"> </span>https://llama31.example.com/v1/chat/completions<span class="w"> </span><span class="se">\</span>
4763+
<span class="w"> </span>-H<span class="w"> </span><span class="s1">&#39;Content-Type: application/json&#39;</span><span class="w"> </span><span class="se">\</span>
4764+
<span class="w"> </span>-H<span class="w"> </span><span class="s1">&#39;Authorization: Bearer &amp;lt;dstack token&amp;gt;&#39;</span><span class="w"> </span><span class="se">\</span>
4765+
<span class="w"> </span>-d<span class="w"> </span><span class="s1">&#39;{</span>
4766+
<span class="s1"> &quot;model&quot;: &quot;meta-llama/Meta-Llama-3.1-8B-Instruct&quot;,</span>
4767+
<span class="s1"> &quot;messages&quot;: [</span>
4768+
<span class="s1"> {</span>
4769+
<span class="s1"> &quot;role&quot;: &quot;user&quot;,</span>
4770+
<span class="s1"> &quot;content&quot;: &quot;Compose a poem that explains the concept of recursion in programming.&quot;</span>
4771+
<span class="s1"> }</span>
4772+
<span class="s1"> ]</span>
4773+
<span class="s1"> }&#39;</span>
4774+
</code></pre></div>
4775+
47414776
</div>
4777+
47424778
<h3 id="replicas-and-scaling">Replicas and scaling<a class="headerlink" href="#replicas-and-scaling" title="Permanent link">&para;</a></h3>
47434779
<p>By default, <code>dstack</code> runs a single replica of the service.
47444780
You can configure the number of replicas as well as the auto-scaling rules.</p>
@@ -4826,10 +4862,6 @@ <h3 id="replicas-and-scaling">Replicas and scaling<a class="headerlink" href="#r
48264862
<summary>Disaggregated serving</summary>
48274863
<p>Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.</p>
48284864
</details>
4829-
<h3 id="model">Model<a class="headerlink" href="#model" title="Permanent link">&para;</a></h3>
4830-
<p>If the service is running a chat model with an OpenAI-compatible interface,
4831-
set the <a href="#model"><code>model</code></a> property to make the model accessible via <code>dstack</code>'s
4832-
global OpenAI-compatible endpoint, and also accessible via <code>dstack</code>'s UI.</p>
48334865
<h3 id="authorization">Authorization<a class="headerlink" href="#authorization" title="Permanent link">&para;</a></h3>
48344866
<p>By default, the service enables authorization, meaning the service endpoint requires a <code>dstack</code> user token.
48354867
This can be disabled by setting <code>auth</code> to <code>false</code>.</p>
@@ -4992,6 +5024,9 @@ <h3 id="rate-limits">Rate limits<a class="headerlink" href="#rate-limits" title=
49925024
</code></pre></div>
49935025
</div>
49945026
</details>
5027+
<h3 id="model">Model<a class="headerlink" href="#model" title="Permanent link">&para;</a></h3>
5028+
<p>If the service runs a model with an OpenAI-compatible interface, you can set the <a href="#model"><code>model</code></a> property to make the model accessible through <code>dstack</code>'s chat UI on the <code>Models</code> page.
5029+
In this case, <code>dstack</code> will use the service's <code>/v1/chat/completions</code> service.</p>
49955030
<h3 id="resources">Resources<a class="headerlink" href="#resources" title="Permanent link">&para;</a></h3>
49965031
<p>If you specify memory size, you can either specify an explicit size (e.g. <code>24GB</code>) or a
49975032
range (e.g. <code>24GB..</code>, or <code>24GB..80GB</code>, or <code>..80GB</code>).</p>

examples/accelerators/tenstorrent.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
9696
<div class="termy">
9797

9898
```shell
99-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
99+
$ curl http://127.0.0.1:3000/proxy/services/main/tt-inference-server/v1/chat/completions \
100100
-X POST \
101101
-H 'Authorization: Bearer &lt;dstack token&gt;' \
102102
-H 'Content-Type: application/json' \

examples/accelerators/tenstorrent/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4195,7 +4195,7 @@ <h2 id="services">Services<a class="headerlink" href="#services" title="Permanen
41954195
at <code>&lt;dstack server URL&gt;/proxy/services/&lt;project name&gt;/&lt;run name&gt;/</code>.</p>
41964196
<div class="termy">
41974197

4198-
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>curl<span class="w"> </span>http://127.0.0.1:3000/proxy/models/main/chat/completions<span class="w"> </span><span class="se">\</span>
4198+
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>curl<span class="w"> </span>http://127.0.0.1:3000/proxy/services/main/tt-inference-server/v1/chat/completions<span class="w"> </span><span class="se">\</span>
41994199
<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span><span class="se">\</span>
42004200
<span class="w"> </span>-H<span class="w"> </span><span class="s1">&#39;Authorization: Bearer &amp;lt;dstack token&amp;gt;&#39;</span><span class="w"> </span><span class="se">\</span>
42014201
<span class="w"> </span>-H<span class="w"> </span><span class="s1">&#39;Content-Type: application/json&#39;</span><span class="w"> </span><span class="se">\</span>

examples/inference/nim.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,12 @@ Provisioning...
7878
```
7979
</div>
8080

81-
If no gateway is created, the model will be available via the OpenAI-compatible endpoint
82-
at `<dstack server URL>/proxy/models/<project name>/`.
81+
If no gateway is created, the service endpoint will be available at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
8382

8483
<div class="termy">
8584

8685
```shell
87-
$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
86+
$ curl http://127.0.0.1:3000/proxy/services/main/serve-distill-deepseek/v1/chat/completions \
8887
-X POST \
8988
-H 'Authorization: Bearer &lt;dstack token&gt;' \
9089
-H 'Content-Type: application/json' \
@@ -106,8 +105,7 @@ $ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
106105

107106
</div>
108107

109-
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the OpenAI-compatible endpoint
110-
is available at `https://gateway.<gateway domain>/`.
108+
When a [gateway](https://dstack.ai/docs/concepts/gateways/) is configured, the service endpoint will be available at `https://serve-distill-deepseek.<gateway domain>/`.
111109

112110
## Source code
113111

0 commit comments

Comments
 (0)