Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
d10e131
feat(templates): implement Gemini Computer Use for TypeScript and Python
dprevoznik Jan 23, 2026
f1f01dc
refactor(templates): cleanup Gemini Computer Use templates
dprevoznik Jan 23, 2026
370132e
Update qa.md
dprevoznik Jan 23, 2026
8c64504
refactor(templates): remove AI slop from Gemini Computer Use templates
dprevoznik Jan 23, 2026
16f2700
fix(session): update viewport dimensions and improve session info han…
dprevoznik Jan 23, 2026
614f005
fix(loop): update screenshot handling in sampling loop
dprevoznik Jan 23, 2026
021ba51
Update python viewport
dprevoznik Jan 23, 2026
ed94644
fix(templates): sync DEFAULT_SCREEN_SIZE with viewport dimensions
dprevoznik Jan 23, 2026
fa9acba
refactor(templates): centralize viewport dimensions in DEFAULT_SCREEN…
dprevoznik Jan 23, 2026
a3837e0
refactor(loop): centralize system prompt generation with current date
dprevoznik Jan 23, 2026
08ff1aa
refactor(tests): remove unused Gemini Computer Use template test case
dprevoznik Jan 23, 2026
9899b61
fix(docs): update API key link and resource references in README files
dprevoznik Jan 23, 2026
25b484a
fix(tools): use nullish coalescing for default magnitude in ComputerTool
dprevoznik Jan 23, 2026
efa5520
docs(templates): update README files for Gemini Computer Use actions
dprevoznik Jan 23, 2026
ae2fd9d
refactor(tools): remove screenshot attribute from ToolResult and Comp…
dprevoznik Jan 23, 2026
ebd52fa
Merge origin/main into danny/kernel-870-gemini-cua-templates
dprevoznik Jan 25, 2026
4265d3e
Enhance error handling in Gemini computer use templates
dprevoznik Jan 25, 2026
3dbb329
Remove vertex config vars from python template setup
dprevoznik Jan 25, 2026
c875170
Refactor GeminiAction usage in templates to derive predefined functio…
dprevoznik Jan 25, 2026
5e859dc
Enhance error logging in Gemini computer use templates + local execut…
dprevoznik Jan 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 17 additions & 4 deletions .cursor/commands/qa.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ Here are all valid language + template combinations:
> **Note:** The `yutori-computer-use` template supports two modes: `computer_use` (default, full VM screenshots) and `playwright` (viewport-only screenshots via CDP). Both modes should be tested.

| python | sample-app | py-sample-app | python-basic | No | - |
| python | gemini-computer-use | py-gemini-cua | python-gemini-cua | Yes | GOOGLE_API_KEY |
| python | captcha-solver | py-captcha-solver | python-captcha-solver | No | - |
| python | browser-use | py-browser-use | python-bu | Yes | OPENAI_API_KEY |
| python | anthropic-computer-use | py-anthropic-cua | python-anthropic-cua | Yes | ANTHROPIC_API_KEY |
Expand Down Expand Up @@ -99,6 +100,7 @@ Run each of these (they are non-interactive when all flags are provided):
../bin/kernel create -n py-openai-cua -l python -t openai-computer-use
../bin/kernel create -n py-openagi-cua -l python -t openagi-computer-use
../bin/kernel create -n py-claude-agent-sdk -l python -t claude-agent-sdk
../bin/kernel create -n py-gemini-cua -l python -t gemini-computer-use
../bin/kernel create -n py-yutori-cua -l python -t yutori-computer-use
```

Expand Down Expand Up @@ -241,6 +243,15 @@ echo "ANTHROPIC_API_KEY=<value from human>" > .env
cd ..
```

**py-gemini-cua** (needs GOOGLE_API_KEY):

```bash
cd py-gemini-cua
echo "GOOGLE_API_KEY=<value from human>" > .env
../bin/kernel deploy main.py --env-file .env
cd ..
```

**py-yutori-cua** (needs YUTORI_API_KEY):

```bash
Expand All @@ -262,7 +273,7 @@ kernel invoke ts-stagehand teamsize-task --payload '{"company": "Kernel"}'
kernel invoke ts-anthropic-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board. You are done successfully when the items are moved.", "record_replay": true}'
kernel invoke ts-magnitude mag-url-extract --payload '{"url": "https://en.wikipedia.org/wiki/Special:Random"}'
kernel invoke ts-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
kernel invoke ts-gemini-cua gemini-cua-task --payload '{"startingUrl": "https://www.magnitasks.com/", "instruction": "Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board? You are done successfully when the items are moved."}'
kernel invoke ts-gemini-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board. You are done successfully when the items are moved.", "record_replay": true}'
kernel invoke ts-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'
Expand All @@ -275,13 +286,14 @@ kernel invoke python-anthropic-cua cua-task --payload '{"query": "Go to http://m
kernel invoke python-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
kernel invoke python-openagi-cua openagi-default-task -p '{"instruction": "Navigate to https://agiopen.org and click the What is Computer Use? button"}'
kernel invoke py-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
kernel invoke python-gemini-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board. You are done successfully when the items are moved.", "record_replay": true}'
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to http://magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'
```

## Step 7: Automated Runtime Testing (Optional)

**STOP and ask the human:** "Would you like me to automatically invoke all 19 test cases and report back on their runtime status?"
**STOP and ask the human:** "Would you like me to automatically invoke all 21 test cases and report back on their runtime status?"

If the human agrees, invoke each template use the Kernel CLI and collect results. Present findings in this format:

Expand Down Expand Up @@ -310,6 +322,7 @@ If the human agrees, invoke each template use the Kernel CLI and collect results
| py-openai-cua | python-openai-cua | | |
| py-openagi-cua | python-openagi-cua | | |
| py-claude-agent-sdk | py-claude-agent-sdk | | |
| py-gemini-cua | python-gemini-cua | | |
| py-yutori-cua | python-yutori-cua | | mode: computer_use |
| py-yutori-cua | python-yutori-cua | | mode: playwright |

Expand All @@ -324,9 +337,9 @@ Notes should include brief error messages for failures or confirmation of succes
- [ ] Built CLI with `make build`
- [ ] Created QA directory
- [ ] Got KERNEL_API_KEY from human
- [ ] Created all 17 template variations
- [ ] Created all 18 template variations
- [ ] Got required API keys from human (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, OAGI_API_KEY, YUTORI_API_KEY)
- [ ] Deployed all 17 apps
- [ ] Deployed all 18 apps
- [ ] Provided invoke commands to human for manual testing
- [ ] (Optional) Ran automated runtime testing and reviewed results

Expand Down
7 changes: 0 additions & 7 deletions cmd/create_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -439,12 +439,6 @@ func TestCreateCommand_InvalidLanguageTemplateCombinations(t *testing.T) {
template: create.TemplateMagnitude,
errContains: "template not found: python/magnitude",
},
{
name: "gemini-computer-use not available for python",
language: create.LanguagePython,
template: create.TemplateGeminiComputerUse,
errContains: "template not found: python/gemini-computer-use",
},
{
name: "invalid language",
language: "ruby",
Expand Down Expand Up @@ -558,7 +552,6 @@ func TestCreateCommand_TemplateNotAvailableForLanguage(t *testing.T) {
create.TemplateBrowserUse: {create.LanguageTypeScript},
create.TemplateStagehand: {create.LanguagePython},
create.TemplateMagnitude: {create.LanguagePython},
create.TemplateGeminiComputerUse: {create.LanguagePython},
}

for template, unavailableLanguages := range unavailableCombinations {
Expand Down
9 changes: 7 additions & 2 deletions pkg/create/templates.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ var Templates = map[string]TemplateInfo{
TemplateGeminiComputerUse: {
Name: "Gemini Computer Use",
Description: "Implements a Gemini computer use agent",
Languages: []string{LanguageTypeScript},
Languages: []string{LanguageTypeScript, LanguagePython},
},
TemplateBrowserUse: {
Name: "Browser Use",
Expand Down Expand Up @@ -201,7 +201,7 @@ var Commands = map[string]map[string]DeployConfig{
TemplateGeminiComputerUse: {
EntryPoint: "index.ts",
NeedsEnvFile: true,
InvokeCommand: "kernel invoke ts-gemini-cua gemini-cua-task",
InvokeCommand: `kernel invoke ts-gemini-cua cua-task --payload '{"query": "Navigate to http://magnitasks.com and click on Tasks in the sidebar"}'`,
},
TemplateClaudeAgentSDK: {
EntryPoint: "index.ts",
Expand Down Expand Up @@ -250,6 +250,11 @@ var Commands = map[string]map[string]DeployConfig{
NeedsEnvFile: true,
InvokeCommand: `kernel invoke py-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'`,
},
TemplateGeminiComputerUse: {
EntryPoint: "main.py",
NeedsEnvFile: true,
InvokeCommand: `kernel invoke python-gemini-cua cua-task --payload '{"query": "Navigate to http://magnitasks.com and click on Tasks in the sidebar"}'`,
},
TemplateYutoriComputerUse: {
EntryPoint: "main.py",
NeedsEnvFile: true,
Expand Down
59 changes: 59 additions & 0 deletions pkg/templates/python/gemini-computer-use/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Kernel Python Sample App - Gemini Computer Use

This is a Kernel application that implements a prompt loop using Google's Gemini Computer Use model with Kernel's Computer Controls API.

## Setup

1. Get your API keys:
- **Kernel**: [dashboard.onkernel.com](https://dashboard.onkernel.com)
- **Google AI**: [aistudio.google.com/api-keys](https://aistudio.google.com/api-keys)

2. Deploy the app:
```bash
kernel login
cp .env.example .env # Add your GOOGLE_API_KEY
kernel deploy main.py --env-file .env
```

## Usage

```bash
kernel invoke python-gemini-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'
```

## Recording Replays

> **Note:** Replay recording is only available to Kernel users on paid plans.

Add `"record_replay": true` to your payload to capture a video of the browser session:

```bash
kernel invoke python-gemini-cua cua-task --payload '{"query": "Navigate to https://example.com", "record_replay": true}'
```

When enabled, the response will include a `replay_url` field with a link to view the recorded session.

## Gemini Computer Use Actions

The Gemini model can execute the following browser actions:

| Action | Description |
|--------|-------------|
| `open_web_browser` | Returns a screenshot (browser is already running) |
| `click_at` | Click at coordinates (x, y) |
| `hover_at` | Move mouse to coordinates (x, y) |
| `type_text_at` | Click and type text at coordinates |
| `scroll_document` | Scroll the page (up/down/left/right) |
| `scroll_at` | Scroll at specific coordinates |
| `search` | Focus the browser URL bar |
| `navigate` | Navigate to a URL |
| `go_back` | Go back in browser history |
| `go_forward` | Go forward in browser history |
| `key_combination` | Press key combination (e.g., "ctrl+c") |
| `drag_and_drop` | Drag from one point to another |
| `wait_5_seconds` | Wait for 5 seconds |

## Resources

- [Google Gemini Computer Use Documentation](https://ai.google.dev/gemini-api/docs/computer-use)
- [Kernel Computer Controls](https://www.kernel.sh/docs/browsers/computer-controls)
6 changes: 6 additions & 0 deletions pkg/templates/python/gemini-computer-use/_gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.venv/
__pycache__/
*.pyc
.env
.env.local
uv.lock
Loading