-
Notifications
You must be signed in to change notification settings - Fork 3
Danny/kernel 742 create yutori n1 computer use cli templates (ts/python) #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Danny/kernel 742 create yutori n1 computer use cli templates (ts/python) #89
Conversation
Add new CLI templates for Yutori's n1 computer use model, enabling users to quickly scaffold browser automation projects using Kernel's infrastructure. Templates (TypeScript & Python): - Agentic sampling loop with n1's OpenAI-compatible API - Computer tool mapping n1 actions (click, type, scroll, drag, etc.) to Kernel's Computer Controls API - Coordinate scaling from n1's 1000x1000 relative space to actual viewport - Session management with replay recording support - read_texts_and_links action using Playwright execution API (with fallback) Key implementation details: - n1 requires screenshots sent with role 'observation' (not 'user') - Model: n1-preview-2025-11 outputs coordinates in 1000x1000 space - Viewport: 1200x800 at 25Hz (closest to Yutori's recommended 1280x800) - Navigation actions (refresh, go_back, goto_url) use keyboard shortcuts via Computer Controls since n1 doesn't use Playwright directly Also updated: - .gitignore: Added qa-* to exclude QA testing directories - pkg/create/templates.go: Registered new yutori-computer-use templates - .cursor/commands/qa.md: Added Yutori templates to QA testing matrix Closes KERNEL-742
Replace page.accessibility.snapshot() with page._snapshotForAI() which is specifically designed for AI agents and documented in Kernel's MCP server. The previous implementation used the experimental/deprecated accessibility API which failed silently and fell back to screenshot-only mode. _snapshotForAI() returns a structured representation of the page optimized for LLM consumption, including visible text, interactive elements (links, buttons, inputs), and page structure - exactly what n1 needs for reading texts and saving URLs for citation.
Add PlaywrightComputerTool adapter that connects via CDP WebSocket for
browser-only screenshots, optimized for Yutori n1's training data per
their documentation recommendations.
Changes:
- Add PlaywrightComputerTool class (TS + Python) using CDP connection
- Add 'mode' parameter to sampling loop ('computer_use' | 'playwright')
- Default to 'computer_use' mode (stable); 'playwright' is opt-in
- Add configurable viewport dimensions (1200x800)
- Expose cdp_ws_url from session for Playwright connection
- Add playwright-core (TS) and playwright (Python) dependencies
The playwright mode provides viewport-only screenshots without OS UI or
browser chrome, improving n1 model performance per Yutori's docs:
https://docs.yutori.com/reference/n1#screenshot-requirements
Add templates + modes for Yutori to QA file
Fix drag operations that previously weren't working properly on Playwright mode operations.
Use ariaSnapshot instead of the existing method, as ariaSnapshot is stably available in both Python and TypeScript versions.
Issue: The ComputerTool.screenshot() method was a synchronous function, but: The N1ComputerToolProtocol expected it to be async The PlaywrightComputerTool.screenshot() was async The loop.py code tried to await it Fix: Changed def screenshot() to async def screenshot() Updated all handler methods to await self.screenshot() instead of return self.screenshot()
Update default delays for actions and screenshots
… moving. Clarified instructions for both computer_use and playwright modes to enhance user understanding and execution accuracy.
The cleanup removed ~300 lines of redundant inline comments and verbose method docstrings while keeping the useful class-level documentation you restored. The templates now match the minimal-comment style of the existing anthropic/openai templates in the codebase.
#88) This PR updates the Go SDK to cee2050be3f8136505d41c20c2903dfca2cbc479 and adds CLI commands for new SDK methods. ## SDK Update - Updated kernel-go-sdk to cee2050be3f8136505d41c20c2903dfca2cbc479 ## Coverage Analysis This PR was generated by performing a full enumeration of SDK methods and CLI commands. ## New Commands - `kernel credential-providers list` - List configured external credential providers - `kernel credential-providers get <id>` - Get a credential provider by ID - `kernel credential-providers create` - Create a new credential provider (supports 1Password) - `kernel credential-providers update <id>` - Update a credential provider's configuration - `kernel credential-providers delete <id>` - Delete a credential provider - `kernel credential-providers test <id>` - Test a credential provider connection ## Breaking Changes Fixed - Fixed `browsers.Get()` calls to pass new required `BrowserGetParams` parameter Triggered by: kernel/kernel-go-sdk@cee2050 Reviewer: @masnwilliams <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Introduces new CLI surfaces and updates for latest SDK. > > - **Agent Auth CLI**: `kernel agents auth` with `create/get/list/delete`, `invocations {create/get/exchange/submit}`, and end‑to‑end `run` flow (auto field submission, TOTP, optional live view); docs and examples added to `README.md`. > - **Credential Providers CLI**: `kernel credential-providers {list/get/create/update/delete/test}` (supports 1Password), wired into root. > - **Browsers API updates**: adapt to SDK breaking change (`browsers.Get` now requires `BrowserGetParams`); add `process resize` and filesystem watch (`fs watch start/stop/events`) commands; tests updated accordingly. > - **Dependencies**: bump `kernel-go-sdk` to cee2050… and add `pquerna/otp`; regenerate `go.sum`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 0b27df6. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Mason Williams <43387599+masnwilliams@users.noreply.github.com> Co-authored-by: Cursor Agent <cursor-agent@kernel.sh> Co-authored-by: Cursor Agent <cursor-agent@onkernel.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com>
…se-cli-templates-typescript
|
Working on fixing comments from bugbot then will request review |
… and remove unused dependencies from Python and TypeScript templates.
…se-cli-templates-typescript
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
pkg/templates/typescript/yutori-computer-use/tools/playwright-computer.ts
Show resolved
Hide resolved
…se-cli-templates-typescript
|
@Sayan- no rush, but could use a quick review on these templates when possible 🙏 Running |
Sayan-
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great stuff!
| recordReplay?: boolean; | ||
| /** Grace period in seconds before stopping replay */ | ||
| replayGracePeriod?: number; | ||
| /** Viewport width (default: 1280 per Yutori recommendation) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment says "default: 1280" but the actual default on line 40 is 1200. consider updating to match the other comments (e.g., "default: 1200, closest to Yutoris 1280 recommendation").
Add Yutori n1 Computer Use CLI Templates
This PR adds new CLI templates for Yutori's n1 computer use model, enabling users to quickly scaffold browser automation projects using Kernel's infrastructure.
New Templates
kernel create --template ts-yutori-cuakernel create --template python-yutori-cuaFeatures
Both templates include:
click,type,scroll,drag,hover,key_press,wait,refresh,go_back,goto_url,stop) to Kernel's Computer Controls APIDual Screenshot Modes
computer_use(default)playwrightImplementation Details
n1-preview-2025-11outputs coordinates in 1000×1000 spaceWith Playwright Mode for viewport-only screenshots
kernel invoke ts-yutori-cua cua-task --payload '{"query": "...", "mode": "playwright"}'Files Changed
pkg/templates/typescript/yutori-computer-use/- TypeScript templatepkg/templates/python/yutori-computer-use/- Python templatepkg/create/templates.go- Template registrationCloses KERNEL-742
Note
Introduces Yutori n1 templates to quickly scaffold browser-automation apps using Kernel and Yutori’s OpenAI-compatible API.
typescript/yutori-computer-useandpython/yutori-computer-usewithindex.ts/main.py, sampling loops, computer tools, Playwright tools, and session managers (replay support)computer_use(full VM screenshots) andplaywright(viewport-only via CDP) with coordinate scaling from 1000×1000 to viewportyutori-computer-useinpkg/create/templates.go, adds invoke/deploy commands, ordering, and env requirements (YUTORI_API_KEY)qa-*in.gitignoreWritten by Cursor Bugbot for commit a0ae834. This will update automatically on new commits. Configure here.