-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add boot time optimizations for faster VM startup #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR adds three optimizations to reduce VM boot time:
1. **Device polling instead of fixed sleep**: Replace the 500ms hardcoded
sleep with a polling mechanism that waits for /dev/vda and /dev/vdb
with 10ms intervals and 2s timeout. Typically completes in 10-50ms.
2. **Skip kernel headers via API parameter**: Add `skip_kernel_headers`
boolean to CreateInstanceRequest. When true, skips the ~2s kernel
headers extraction. Safe for workloads that don't need DKMS.
3. **Skip guest-agent via API parameter**: Add `skip_guest_agent` boolean
to CreateInstanceRequest. When true, skips guest-agent copy (disables
exec/stat API). Also includes lazy copy optimization that skips if
the agent already exists in the overlay.
New API usage:
```json
{
"name": "my-instance",
"image": "node:20",
"skip_kernel_headers": true,
"skip_guest_agent": false
}
```
Estimated savings:
- Device polling: 400-490ms
- Skip kernel headers: ~2s (when tarball present)
- Skip guest-agent: 50-100ms
✱ Stainless preview buildsThis PR will update the
|
When skip_guest_agent is true, the guest-agent binary is not copied, but the systemd service was still being injected. This caused systemd to repeatedly try to start the non-existent binary every 3 seconds, resulting in continuous log spam. Now the service injection is skipped when skip_guest_agent is true.
sjmiller609
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
When skip_guest_agent is true, the guest-agent binary is not copied, but exec mode was still unconditionally trying to start it. This caused an error log on every boot with skip_guest_agent enabled. Now the guest-agent start is skipped when skip_guest_agent is true, consistent with the systemd mode behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| // The guest-agent runs forever, so this effectively keeps the VM alive | ||
| // until it's explicitly terminated | ||
| if agentCmd.Process != nil { | ||
| if agentCmd != nil && agentCmd.Process != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VM terminates unexpectedly when skip_guest_agent is enabled
Medium Severity
When skip_guest_agent is true in exec mode, the VM terminates when the application exits because nothing keeps PID 1 alive. The code comment explicitly notes that the guest-agent wait "keeps init alive, prevents kernel panic," but when skipped, syscall.Exit(exitCode) is called immediately after the app finishes. This causes a kernel panic when PID 1 exits. However, the OpenAPI documentation states "The instance will still run, but remote command execution will be unavailable," which contradicts the actual behavior.
Additional Locations (1)
The device polling optimization waited only for /dev/vda and /dev/vdb, but /dev/vdc (config disk) was mounted without polling. This could cause mount failures if vdc appeared slightly after vdb. Now readConfig() polls for /dev/vdc before attempting to mount it, consistent with how we handle the other block devices.
Resolved conflict in lib/oapi/oapi.go by regenerating from merged openapi.yaml.
Summary
skip_kernel_headersAPI parameter to skip kernel headers extraction (~2s savings)skip_guest_agentAPI parameter to skip guest-agent copy (~50-100ms savings)New API Parameters
{ "name": "my-instance", "image": "node:20", "skip_kernel_headers": true, "skip_guest_agent": false }trueskip_kernel_headersfalseskip_guest_agentfalseTest plan
skip_kernel_headers: trueshowsskipping kernel headers setupin logsskip_guest_agent: falsecopies guest-agent normallygo build ./...andgo vet ./...Note
Introduces boot-time optimizations and optional skips for guest setup, wired end-to-end from API to guest init.
skip_kernel_headersandskip_guest_agenttoCreateInstanceAPI (openapi.yaml,lib/oapi,cmd/api/...) and plumbs through domain types (lib/instances/types.go,create.go) to guest config (lib/instances/configdisk.go,lib/vmconfig).lib/system/init/*).waitForDevice) for/dev/vda//dev/vdband config disk (/dev/vdc) to speed boot.Notes: Skipping kernel headers disables DKMS; skipping guest agent disables exec/stat APIs for the instance.
Written by Cursor Bugbot for commit f7e5933. This will update automatically on new commits. Configure here.