Skip to content

Commit 2de20bf

Browse files
Deploying to gh-pages from @ dstackai/dstack@fb2b53c 🚀
1 parent 3684467 commit 2de20bf

File tree

7 files changed

+201
-124
lines changed

7 files changed

+201
-124
lines changed

assets/images/social/examples.png

-285 Bytes
Loading

assets/images/social/partners.png

-92 Bytes
Loading

docs/guides/metrics/index.html

Lines changed: 81 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1099,6 +1099,32 @@
10991099
</label>
11001100
<ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
11011101

1102+
<li class="md-nav__item">
1103+
<a href="#ui" class="md-nav__link">
1104+
<span class="md-ellipsis">
1105+
1106+
<span class="md-typeset">
1107+
UI
1108+
</span>
1109+
1110+
</span>
1111+
</a>
1112+
1113+
</li>
1114+
1115+
<li class="md-nav__item">
1116+
<a href="#cli" class="md-nav__link">
1117+
<span class="md-ellipsis">
1118+
1119+
<span class="md-typeset">
1120+
CLI
1121+
</span>
1122+
1123+
</span>
1124+
</a>
1125+
1126+
</li>
1127+
11021128
<li class="md-nav__item">
11031129
<a href="#prometheus" class="md-nav__link">
11041130
<span class="md-ellipsis">
@@ -3960,6 +3986,32 @@
39603986
</label>
39613987
<ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
39623988

3989+
<li class="md-nav__item">
3990+
<a href="#ui" class="md-nav__link">
3991+
<span class="md-ellipsis">
3992+
3993+
<span class="md-typeset">
3994+
UI
3995+
</span>
3996+
3997+
</span>
3998+
</a>
3999+
4000+
</li>
4001+
4002+
<li class="md-nav__item">
4003+
<a href="#cli" class="md-nav__link">
4004+
<span class="md-ellipsis">
4005+
4006+
<span class="md-typeset">
4007+
CLI
4008+
</span>
4009+
4010+
</span>
4011+
</a>
4012+
4013+
</li>
4014+
39634015
<li class="md-nav__item">
39644016
<a href="#prometheus" class="md-nav__link">
39654017
<span class="md-ellipsis">
@@ -4101,10 +4153,37 @@
41014153

41024154

41034155
<h1 id="metrics">Metrics<a class="headerlink" href="#metrics" title="Permanent link">&para;</a></h1>
4156+
<p><code>dstack</code> automatically tracks essential metrics, which you can access via the CLI and UI.
4157+
You can also configure the <code>dstack</code> server to export metrics to Prometheus—this is required to access advanced metrics such as those from DCGM.</p>
4158+
<h2 id="ui">UI<a class="headerlink" href="#ui" title="Permanent link">&para;</a></h2>
4159+
<p>To access metrics via the UI, open the page of the corresponding run or job and switch to the <code>Metrics</code> tab:</p>
4160+
<p><img alt="" src="https://dstack.ai/static-assets/static-assets/images/dstack-newsletter-metrics.png" width="800" /></p>
4161+
<p>This tab displays key CPU, memory, and GPU metrics collected during the last hour of the run or job.</p>
4162+
<h2 id="cli">CLI<a class="headerlink" href="#cli" title="Permanent link">&para;</a></h2>
4163+
<p>As an alternative to the UI, you can track real-time essential metrics via the CLI.
4164+
The <code>dstack metrics</code> command displays the most recently tracked CPU, memory, and GPU metrics.</p>
4165+
<div class="termy">
4166+
4167+
<div class="highlight"><pre><span></span><code>dstack<span class="w"> </span>metrics<span class="w"> </span>gentle-mayfly-1
4168+
4169+
<span class="w"> </span>NAME<span class="w"> </span>STATUS<span class="w"> </span>CPU<span class="w"> </span>MEMORY<span class="w"> </span>GPU
4170+
<span class="w"> </span>gentle-mayfly-1<span class="w"> </span><span class="k">done</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">16</span>.27GB/2000GB<span class="w"> </span><span class="nv">gpu</span><span class="o">=</span><span class="m">0</span><span class="w"> </span><span class="nv">mem</span><span class="o">=</span><span class="m">72</span>.48GB/80GB<span class="w"> </span><span class="nv">util</span><span class="o">=</span><span class="m">0</span>%
4171+
<span class="w"> </span><span class="nv">gpu</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">mem</span><span class="o">=</span><span class="m">64</span>.99GB/80GB<span class="w"> </span><span class="nv">util</span><span class="o">=</span><span class="m">0</span>%
4172+
<span class="w"> </span><span class="nv">gpu</span><span class="o">=</span><span class="m">2</span><span class="w"> </span><span class="nv">mem</span><span class="o">=</span>580MB/80GB<span class="w"> </span><span class="nv">util</span><span class="o">=</span><span class="m">0</span>%
4173+
<span class="w"> </span><span class="nv">gpu</span><span class="o">=</span><span class="m">3</span><span class="w"> </span><span class="nv">mem</span><span class="o">=</span>4MB/80GB<span class="w"> </span><span class="nv">util</span><span class="o">=</span><span class="m">0</span>%
4174+
<span class="w"> </span><span class="nv">gpu</span><span class="o">=</span><span class="m">4</span><span class="w"> </span><span class="nv">mem</span><span class="o">=</span>4MB/80GB<span class="w"> </span><span class="nv">util</span><span class="o">=</span><span class="m">0</span>%
4175+
<span class="w"> </span><span class="nv">gpu</span><span class="o">=</span><span class="m">5</span><span class="w"> </span><span class="nv">mem</span><span class="o">=</span>4MB/80GB<span class="w"> </span><span class="nv">util</span><span class="o">=</span><span class="m">0</span>%
4176+
<span class="w"> </span><span class="nv">gpu</span><span class="o">=</span><span class="m">6</span><span class="w"> </span><span class="nv">mem</span><span class="o">=</span>4MB/80GB<span class="w"> </span><span class="nv">util</span><span class="o">=</span><span class="m">0</span>%
4177+
<span class="w"> </span><span class="nv">gpu</span><span class="o">=</span><span class="m">7</span><span class="w"> </span><span class="nv">mem</span><span class="o">=</span>292MB/80GB<span class="w"> </span><span class="nv">util</span><span class="o">=</span><span class="m">0</span>%
4178+
</code></pre></div>
4179+
4180+
</div>
4181+
41044182
<h2 id="prometheus">Prometheus<a class="headerlink" href="#prometheus" title="Permanent link">&para;</a></h2>
4105-
<p>To collect and export fleet and run metrics to Prometheus, enable the
4106-
<code>DSTACK_ENABLE_PROMETHEUS_METRICS</code> environment variable and configure Prometheus to fetch metrics from
4183+
<p>To enable exporting metrics to Prometheus, set the
4184+
<code>DSTACK_ENABLE_PROMETHEUS_METRICS</code> environment variable and configure Prometheus to scrape metrics from
41074185
<code>&lt;dstack server URL&gt;/metrics</code>.</p>
4186+
<p>In addition to the essential metrics available via the CLI and UI, <code>dstack</code> exports additional metrics to Prometheus, including data on fleets, runs, jobs, and DCGM metrics.</p>
41084187
<details class="info">
41094188
<summary>NVIDIA DCGM</summary>
41104189
<p>NVIDIA DCGM metrics are automatically collected for <code>aws</code>, <code>azure</code>, <code>gcp</code>, and <code>oci</code> backends,

docs/guides/protips/index.html

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4756,9 +4756,7 @@ <h2 id="offers">Offers<a class="headerlink" href="#offers" title="Permanent link
47564756
</div>
47574757

47584758
<h2 id="metrics">Metrics<a class="headerlink" href="#metrics" title="Permanent link">&para;</a></h2>
4759-
<p>While <code>dstack</code> allows the use of any third-party monitoring tools (e.g., Weights and Biases), you can also
4760-
monitor container metrics such as CPU, memory, and GPU usage using the <a href="../../../blog/dstack-metrics/">built-in
4761-
<code>dstack metrics</code> CLI command</a> or the corresponding API.</p>
4759+
<p><code>dstack</code> tracks essential metrics accessible via the CLI and UI. To access advanced metrics like DCGM, configure the server to export metrics to Prometheus. See <a href="../metrics/">Metrics</a> for details.</p>
47624760
<h2 id="service-quotas">Service quotas<a class="headerlink" href="#service-quotas" title="Permanent link">&para;</a></h2>
47634761
<p>If you're using your own AWS, GCP, Azure, or OCI accounts, before you can use GPUs or spot instances, you have to request the
47644762
corresponding service quotas for each type of instance in each region.</p>

search/search_index.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)