Skip to content

Conversation

@hiroTamada
Copy link
Contributor

@hiroTamada hiroTamada commented Jan 23, 2026

Summary

  • Add device polling instead of 500ms fixed sleep for faster boot (~400-490ms savings)
  • Add skip_kernel_headers API parameter to skip kernel headers extraction (~2s savings)
  • Add skip_guest_agent API parameter to skip guest-agent copy (~50-100ms savings)
  • Include lazy copy optimization for guest-agent (skips if already exists)

New API Parameters

{
  "name": "my-instance",
  "image": "node:20",
  "skip_kernel_headers": true,
  "skip_guest_agent": false
}
Parameter Default Effect when true
skip_kernel_headers false Skips ~2s kernel headers extraction. Disables DKMS.
skip_guest_agent false Skips guest-agent copy. Disables exec/stat API.

Test plan

  • Verified device polling works (mount happens immediately, no 500ms delay)
  • Verified skip_kernel_headers: true shows skipping kernel headers setup in logs
  • Verified skip_guest_agent: false copies guest-agent normally
  • Verified guest-agent starts and listens on vsock
  • Code compiles with go build ./... and go vet ./...

Note

Introduces boot-time optimizations and optional skips for guest setup, wired end-to-end from API to guest init.

  • Adds skip_kernel_headers and skip_guest_agent to CreateInstance API (openapi.yaml, lib/oapi, cmd/api/...) and plumbs through domain types (lib/instances/types.go, create.go) to guest config (lib/instances/configdisk.go, lib/vmconfig).
  • Guest init honors options: skips kernel headers setup and agent install/start; implements lazy agent copy and service injection skipping (lib/system/init/*).
  • Replaces fixed 500ms sleep with device polling (waitForDevice) for /dev/vda//dev/vdb and config disk (/dev/vdc) to speed boot.
  • Regenerates embedded Swagger spec.

Notes: Skipping kernel headers disables DKMS; skipping guest agent disables exec/stat APIs for the instance.

Written by Cursor Bugbot for commit f7e5933. This will update automatically on new commits. Configure here.

This PR adds three optimizations to reduce VM boot time:

1. **Device polling instead of fixed sleep**: Replace the 500ms hardcoded
   sleep with a polling mechanism that waits for /dev/vda and /dev/vdb
   with 10ms intervals and 2s timeout. Typically completes in 10-50ms.

2. **Skip kernel headers via API parameter**: Add `skip_kernel_headers`
   boolean to CreateInstanceRequest. When true, skips the ~2s kernel
   headers extraction. Safe for workloads that don't need DKMS.

3. **Skip guest-agent via API parameter**: Add `skip_guest_agent` boolean
   to CreateInstanceRequest. When true, skips guest-agent copy (disables
   exec/stat API). Also includes lazy copy optimization that skips if
   the agent already exists in the overlay.

New API usage:
```json
{
  "name": "my-instance",
  "image": "node:20",
  "skip_kernel_headers": true,
  "skip_guest_agent": false
}
```

Estimated savings:
- Device polling: 400-490ms
- Skip kernel headers: ~2s (when tarball present)
- Skip guest-agent: 50-100ms
@github-actions
Copy link

github-actions bot commented Jan 23, 2026

✱ Stainless preview builds

This PR will update the hypeman SDKs with the following commit message.

feat: add boot time optimizations for faster VM startup
⚠️ hypeman-typescript studio · code

There was a regression in your SDK.
generate ⚠️build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/hypeman-typescript/20a98ea15491dc399f57ea5cc5d10f2549d1c4bd/dist.tar.gz
⚠️ hypeman-go studio · code

There was a regression in your SDK.
generate ⚠️lint ✅test ✅

go get github.com/stainless-sdks/hypeman-go@3992761e3ad8ebb0cc22fb7408199b068e9d8013
hypeman-cli studio

Code was not generated because there was a fatal error.


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-01-23 21:02:11 UTC

cursor[bot]

This comment was marked as outdated.

When skip_guest_agent is true, the guest-agent binary is not copied,
but the systemd service was still being injected. This caused systemd
to repeatedly try to start the non-existent binary every 3 seconds,
resulting in continuous log spam.

Now the service injection is skipped when skip_guest_agent is true.
cursor[bot]

This comment was marked as outdated.

Copy link
Collaborator

@sjmiller609 sjmiller609 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

When skip_guest_agent is true, the guest-agent binary is not copied,
but exec mode was still unconditionally trying to start it. This caused
an error log on every boot with skip_guest_agent enabled.

Now the guest-agent start is skipped when skip_guest_agent is true,
consistent with the systemd mode behavior.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

// The guest-agent runs forever, so this effectively keeps the VM alive
// until it's explicitly terminated
if agentCmd.Process != nil {
if agentCmd != nil && agentCmd.Process != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VM terminates unexpectedly when skip_guest_agent is enabled

Medium Severity

When skip_guest_agent is true in exec mode, the VM terminates when the application exits because nothing keeps PID 1 alive. The code comment explicitly notes that the guest-agent wait "keeps init alive, prevents kernel panic," but when skipped, syscall.Exit(exitCode) is called immediately after the app finishes. This causes a kernel panic when PID 1 exits. However, the OpenAPI documentation states "The instance will still run, but remote command execution will be unavailable," which contradicts the actual behavior.

Additional Locations (1)

Fix in Cursor Fix in Web

Hiro Tamada added 2 commits January 23, 2026 14:57
The device polling optimization waited only for /dev/vda and /dev/vdb,
but /dev/vdc (config disk) was mounted without polling. This could cause
mount failures if vdc appeared slightly after vdb.

Now readConfig() polls for /dev/vdc before attempting to mount it,
consistent with how we handle the other block devices.
Resolved conflict in lib/oapi/oapi.go by regenerating from merged openapi.yaml.
@hiroTamada hiroTamada merged commit 200e25d into main Jan 23, 2026
3 of 4 checks passed
@hiroTamada hiroTamada deleted the feat/boot-time-optimizations branch January 23, 2026 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants