You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p><code>dstack</code> is an <ahref="https://github.com/dstackai/dstack" target="_blank">open-source <spanclass="twemoji external"><svgxmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><pathd="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a> control plane for orchestrating GPU workloads. It can provision cloud VMs, run on top of Kubernetes, or manage on-prem clusters. If you don’t want to self-host, you can use <ahref="https://sky.dstack.ai" target="_blank">dstack Sky <spanclass="twemoji external"><svgxmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><pathd="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a>, the managed version of <code>dstack</code> that also provides access to cloud GPUs via its markfetplace.</p>
3906
+
<p>With our latest release, we’re excited to announce that <ahref="https://nebius.com/" target="_blank">Nebius <spanclass="twemoji external"><svgxmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><pathd="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a>, a purpose-built AI cloud for large scale training and inference, has joined the <code>dstack</code> Sky marketplace
3907
+
to offer on-demand and spot GPUs, including clusters.</p>
<h2id="exporting-gpu-cost-and-other-metrics-to-prometheus"><aclass="toclink" href="../prometheus/">Exporting GPU, cost, and other metrics to Prometheus</a></h2>
<p>Effective AI infrastructure management requires full visibility into compute performance and costs. AI researchers need
4344
-
detailed insights into container- and GPU-level performance, while managers rely on cost metrics to track resource usage
4345
-
across projects.</p>
4346
-
<p>While <code>dstack</code> provides key metrics through its UI and <ahref="../dstack-metrics/"><code>dstack metrics</code></a> CLI, teams often need more granular data and prefer
4347
-
using their own monitoring tools. To support this, we’ve introduced a new endpoint that allows real-time exporting all collected
4348
-
metrics—covering fleets and runs—directly to Prometheus.</p>
<h2id="exporting-gpu-cost-and-other-metrics-to-prometheus"><aclass="toclink" href="../../../prometheus/">Exporting GPU, cost, and other metrics to Prometheus</a></h2>
<p>Effective AI infrastructure management requires full visibility into compute performance and costs. AI researchers need
3905
+
detailed insights into container- and GPU-level performance, while managers rely on cost metrics to track resource usage
3906
+
across projects.</p>
3907
+
<p>While <code>dstack</code> provides key metrics through its UI and <ahref="../../../dstack-metrics/"><code>dstack metrics</code></a> CLI, teams often need more granular data and prefer
3908
+
using their own monitoring tools. To support this, we’ve introduced a new endpoint that allows real-time exporting all collected
3909
+
metrics—covering fleets and runs—directly to Prometheus.</p>
<h2id="using-volumes-to-optimize-cold-starts-on-runpod"><aclass="toclink" href="../../../volumes-on-runpod/">Using volumes to optimize cold starts on RunPod</a></h2>
4349
-
<p>Deploying custom models in the cloud often faces the challenge of cold start times, including the time to provision a
4350
-
new instance and download the model. This is especially relevant for services with autoscaling when new model replicas
4351
-
need to be provisioned quickly. </p>
4352
-
<p>Let's explore how <code>dstack</code> optimizes this process using volumes, with an example of
0 commit comments