Skip to content

Conversation

@tycho
Copy link
Member

@tycho tycho commented Jan 13, 2026

This is necessary for a couple of reasons:

  • to be able to run GPU workloads without launching a --privileged one first and tickling the GPU (e.g. with nvidia-smi)
  • to enable persistence mode and keep the GPU ready to start work (latency improvement)

The current workflow for launching a GPU zone is a little painful:

  • create zone with GPU attached
  • create --privileged workload with nvidia-smi in it (e.g. nvidia/cuda:13.1.0-devel-ubuntu24.04)
  • run nvidia-smi in workload
  • destroy workload
  • create non-privileged workloads for real work

This change adds a hook that runs nvidia-smi -pm 1 in the zone before workloads launch to eliminate the middle three steps above.

@tycho tycho requested a review from azenla January 13, 2026 22:10
@tycho tycho force-pushed the steven/nvidia-persistence-mode branch from b788854 to 807a94d Compare January 29, 2026 02:46
Works for 'setup' and the recently added 'hotplug'. Takes care of
loading the modules in the zone and enabling persistence mode so the GPU
stays awake.

Signed-off-by: Steven Noonan <steven@edera.dev>
@tycho tycho force-pushed the steven/nvidia-persistence-mode branch from 807a94d to 1dcba1f Compare January 29, 2026 06:39
@tycho tycho merged commit 1dcba1f into main Jan 29, 2026
5 checks passed
@tycho tycho deleted the steven/nvidia-persistence-mode branch January 29, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants