Action-State-Labs · Ibouseye04 · Dec 16, 2025 · Dec 16, 2025 · Dec 16, 2025 · Dec 16, 2025
diff --git a/.env.example b/.env.example
@@ -0,0 +1,12 @@
+# Copy this file to .env and fill in your API keys
+# .env is gitignored and will not be committed
+
+# OpenRouter API Key (default provider)
+OPENROUTER_API_KEY=your-openrouter-api-key-here
+
+# Optional: Use OpenAI directly instead of OpenRouter
+# LLM_PROVIDER=openai
+# OPENAI_API_KEY=your-openai-api-key-here
+
+# Optional: Override the default model
+# LLM_MODEL=openai/gpt-4o
diff --git a/.gitignore b/.gitignore
@@ -1,11 +1,8 @@
 __pycache__/
 *.xml
 .env
-venv/venv/
-__pycache__/
-*.xml
 .DS_Store
-.env
-venv/
-venv/
 venv/
+myenv/
+logs/
+window_dump.xml
diff --git a/README.md b/README.md
@@ -157,7 +157,7 @@ Browser agents can't reach these. Desktop agents don't fit. **Android Use is the
 - Python 3.10+
 - Android device or emulator (USB debugging enabled)
 - ADB (Android Debug Bridge)
-- OpenAI API key
+- OpenRouter API key (default) **or** OpenAI API key
 
 ### Installation
 
@@ -176,13 +176,21 @@ brew install android-platform-tools  # macOS
 # 4. Connect device & verify
 adb devices
 
-# 5. Set API key
-export OPENAI_API_KEY="sk-..."
+# 5. Set API key (OpenRouter is the default provider)
+export OPENROUTER_API_KEY="sk-or-..."
 
 # 6. Run your first agent
 python kernel.py
 ```
 
+### Alternative: Use OpenAI Directly
+
+```bash
+# Override to use OpenAI instead of OpenRouter
+export LLM_PROVIDER=openai
+export OPENAI_API_KEY="sk-..."
+```
+
 ### Try It: Logistics Example
 
 ```python

diff --git a/docs/IMPLEMENTATION_PLAN.md b/docs/IMPLEMENTATION_PLAN.md
@@ -0,0 +1,62 @@
+# Implementation Plan: OpenRouter Default (GPT-4o via OpenRouter)
+
+## Goal
+Make **OpenRouter** the default LLM provider while preserving the current agent loop reports in `README.md` and behavior in `kernel.py`:
+
+- Perception: dump Android accessibility tree via `uiautomator` and sanitize it
+- Reasoning: ask an LLM for the next action as **a single JSON object**
+- Action: execute via ADB (`tap`, `type`, `home`, `back`, `wait`, `done`)
+
+## Non-Goals
+- Changing the agent UX (still `python kernel.py` → prompts for goal)
+- Adding new actions/tool calling
+- Rewriting the sanitizer logic
+
+## Default Provider Decision
+- Default provider: **OpenRouter**
+- Default model via OpenRouter: **`openai/gpt-4o`**
+
+## New Configuration (env vars)
+- `OPENROUTER_API_KEY` (required by default)
+- `LLM_PROVIDER` (optional override; values: `openrouter`, `openai`)
+- `LLM_MODEL` (optional override; default depends on provider)
+- `OPENAI_API_KEY` (only required if `LLM_PROVIDER=openai`)
+
+## Work Breakdown (milestones)
+
+### Milestone 1 — Add docs-first implementation instructions
+- Create docs structure:
+  - `docs/features/openrouter-default.md`
+  - `docs/bugs/kernel-known-bugs.md`
+- Ensure instructions are atomic and include “why” for each step.
+
+### Milestone 2 — Implement provider abstraction (small refactor)
+- Add a small “LLM client factory” that chooses:
+  - OpenRouter client (default)
+  - OpenAI client (opt-in)
+- Keep call site to `client.chat.completions.create(...)` unchanged.
+
+### Milestone 3 — Preserve JSON-action contract across models/providers
+- Keep `response_format={"type":"json_object"}`.
+- Add parse/validation + 1 retry if output is invalid JSON.
+
+### Milestone 4 — Fix correctness bugs discovered during review
+- Fix issues documented in `docs/bugs/kernel-known-bugs.md`.
+
+### Milestone 5 — Update README and do a smoke test
+- Update `README.md` Quick Start to prefer OpenRouter.
+- Manual smoke test:
+  - Run `python kernel.py` with a simple goal (e.g. “go home”).
+  - Confirm ADB commands work and the model returns valid JSON actions.
+
+## Acceptance Criteria
+- Running with **only** `OPENROUTER_API_KEY` set works (OpenRouter default).
+- Setting `LLM_PROVIDER=openai` with `OPENAI_API_KEY` works.
+- Actions returned by the model are validated (no crashes on missing fields).
+- Key ADB actions (`home`, `back`) use correct keycodes.
+
+## Rollback Plan
+- If OpenRouter routing/model output is unstable, keep OpenRouter default but allow fallback:
+  - `LLM_PROVIDER=openai`
+  - `LLM_MODEL=gpt-4o`
+
diff --git a/docs/bugs/kernel-known-bugs.md b/docs/bugs/kernel-known-bugs.md
@@ -0,0 +1,109 @@
+# Bugs: Known Issues in `kernel.py` (and Proposed Fixes)
+
+This document lists bugs discovered during review that will impact correctness and/or stability. Each bug includes a proposed fix and the reason it matters.
+
+## 1) Missing import: `List` used but not imported
+**Where**
+- `kernel.py`: `def run_adb_command(command: List[str]):`
+
+**Problem**
+- `List` is not imported from `typing`, which will raise a `NameError` at runtime.
+
+**Proposed Fix**
+- Change typing import to include `List`:
+  - `from typing import Dict, Any, List`
+
+**Why it matters**
+- This prevents the script from running at all.
+
+## 2) Wrong ADB keyevent constants for Home/Back
+**Where**
+- `kernel.py`:
+  - `KEYWORDS_HOME`
+  - `KEYWORDS_BACK`
+
+**Problem**
+- The Android keyevent constants are `KEYCODE_HOME` and `KEYCODE_BACK`.
+- Current constants will cause ADB to fail (or do nothing) when trying to go home/back.
+
+**Proposed Fix**
+- Replace with:
+  - `KEYCODE_HOME`
+  - `KEYCODE_BACK`
+
+**Why it matters**
+- Navigation actions are core to the agent loop.
+
+## 3) Potential crash: `tap` coordinates unpacking without validation
+**Where**
+- `execute_action()`:
+  - `x, y = action.get("coordinates")`
+
+**Problem**
+- If `coordinates` is missing or malformed, unpacking throws an exception.
+
+**Proposed Fix**
+- Validate the action schema before executing:
+  - Ensure `coordinates` exists
+  - Ensure it is a 2-item list/tuple
+  - Ensure each value can be converted to int
+
+**Why it matters**
+- LLMs occasionally return malformed payloads; the agent should fail gracefully.
+
+## 4) Potential crash: `type` action assumes `text` exists
+**Where**
+- `execute_action()`:
+  - `text = action.get("text").replace(" ", "%s")`
+
+**Problem**
+- If `text` is missing, `action.get("text")` returns `None` and `.replace(...)` crashes.
+
+**Proposed Fix**
+- Validate `text` exists and is a string before calling `.replace`.
+
+**Why it matters**
+- Prevents agent from crashing mid-run.
+
+## 5) Hard exit inside library function (`exit(0)`) reduces reusability
+**Where**
+- `execute_action()` on `done`:
+  - `exit(0)`
+
+**Problem**
+- If `run_agent()` is imported and used by another module, `exit(0)` will terminate the entire host process.
+
+**Proposed Fix**
+- Prefer returning a sentinel (e.g. `True` for completed) or raising a specific exception that `run_agent()` catches.
+
+**Why it matters**
+- Enables embedding this library into other tools/services without unexpected process termination.
+
+## 6) ADB error detection is brittle
+**Where**
+- `run_adb_command()`:
+  - checks `if result.stderr and "error" in result.stderr.lower()`
+
+**Problem**
+- Many ADB failures show up in stdout or return codes.
+- Ignoring `returncode` can hide failures.
+
+**Proposed Fix**
+- Check `result.returncode != 0` and include both stdout/stderr in the error message.
+
+**Why it matters**
+- Makes debugging device connectivity and ADB issues far easier.
+
+## 7) Ambiguous `focus` usage in sanitizer (minor)
+**Where**
+- `sanitizer.py`:
+  - `is_editable = node.attrib.get("focus") == "true" or node.attrib.get("focusable") == "true"`
+
+**Problem**
+- `focus/focusable` is not the same as "editable".
+
+**Proposed Fix**
+- (Optional) Use attributes like `class` (`EditText`) or `long-clickable`/`enabled` to identify text fields more accurately.
+
+**Why it matters**
+- Better context improves LLM decision quality; not required for OpenRouter switch.
diff --git a/docs/features/openrouter-default.md b/docs/features/openrouter-default.md
@@ -0,0 +1,121 @@
+# Feature: Make OpenRouter the Default LLM Provider (GPT-4o)
+
+## Summary
+Refactor `kernel.py` so the default LLM provider is **OpenRouter**, using model **`openai/gpt-4o`**, while keeping the current agent loop and JSON action contract.
+
+## Target Behavior
+- Running `python kernel.py` should work with only:
+  - `OPENROUTER_API_KEY` set
+- OpenAI remains available as an override:
+  - `LLM_PROVIDER=openai` + `OPENAI_API_KEY`
+
+## Atomic Steps (with “Why”)
+
+### 1) Decide and document env var contract
+**Do**
+- Define these env vars:
+  - `OPENROUTER_API_KEY` (required by default)
+  - `LLM_PROVIDER` (optional; default `openrouter`)
+  - `LLM_MODEL` (optional; default depends on provider)
+  - `OPENAI_API_KEY` (only required if `LLM_PROVIDER=openai`)
+
+**Why**
+- A junior engineer needs a single source of truth for configuration.
+- Keeping OpenAI as opt-in reduces risk and makes debugging easier.
+
+### 2) Replace the global `MODEL` constant with provider-aware defaults
+**Do**
+- Introduce a provider-aware model selection:
+  - If provider is `openrouter`: default `openai/gpt-4o`
+  - If provider is `openai`: default `gpt-4o`
+- Allow `LLM_MODEL` to override in both cases.
+
+**Why**
+- OpenRouter uses namespaced model IDs; OpenAI does not.
+- This prevents confusing “model not found” errors.
+
+### 3) Create a tiny “LLM client factory” in `kernel.py`
+**Do**
+- Add a function, e.g. `get_llm_client_and_model()` that returns:
+  - `client`
+  - `model`
+- Build the OpenAI SDK client like:
+  - OpenRouter default:
+    - `OpenAI(api_key=OPENROUTER_API_KEY, base_url="https://openrouter.ai/api/v1")`
+  - OpenAI override:
+    - `OpenAI(api_key=OPENAI_API_KEY)`
+
+**Why**
+- Centralizes provider logic.
+- Avoids littering conditionals across `get_llm_decision()`.
+- Makes future provider additions (Claude/Gemini via OpenRouter, etc.) straightforward.
+
+### 4) Add OpenRouter optional headers (non-blocking)
+**Do**
+- If the OpenAI SDK version in this repo supports default headers:
+  - Add `HTTP-Referer` and `X-Title` for OpenRouter requests.
+- If it does not, skip this step.
+
+**Why**
+- OpenRouter recommends these headers for attribution/analytics.
+- Not required for correctness; keep it optional to reduce implementation risk.
+
+### 5) Keep JSON response mode, but add a fallback parsing strategy
+**Do**
+- Keep `response_format={"type": "json_object"}`.
+- Wrap JSON parsing in a try/catch.
+- If parsing fails:
+  - Retry once with a stricter prompt (still requiring only JSON output)
+  - If it still fails, raise a clear error that includes the raw response text.
+
+**Why**
+- Different routed models can be slightly less strict about JSON-only output.
+- A single retry often fixes transient “formatting drift” without changing the UX.
+
+### 6) Validate the returned action schema before executing
+**Do**
+- Before `execute_action(decision)`:
+  - Validate `decision["action"]` is one of:
+    - `tap`, `type`, `home`, `back`, `wait`, `done`
+  - If `tap`, require `coordinates` as a 2-item list of ints.
+  - If `type`, require `text` as a non-empty string.
+
+**Why**
+- Prevents crashes and device misclicks.
+- Makes the behavior consistent even when the LLM is imperfect.
+
+### 7) Update README “Quick Start” to prefer OpenRouter
+**Do**
+- Replace or augment the existing OpenAI setup section with:
+  - `export OPENROUTER_API_KEY="..."`
+  - (optional) `export LLM_MODEL="openai/gpt-4o"`
+- Add an “OpenAI override” snippet:
+  - `export LLM_PROVIDER=openai`
+  - `export OPENAI_API_KEY="..."`
+
+**Why**
+- Docs should match the new default so new users don’t get blocked.
+
+### 8) Add a minimal manual smoke test checklist
+**Do**
+- Validate both modes:
+  - OpenRouter default
+  - OpenAI override
+- Use a simple goal and verify at least one valid action executes.
+
+**Why**
+- Prevents regressions before merging.
+- Junior engineers get confidence quickly with concrete steps.
+
+## Expected Code Touch Points
+- `kernel.py`
+  - Add provider config + client factory
+  - Update model constant usage
+  - Add JSON parsing fallback + action validation
+- `README.md`
+  - Update environment variable setup instructions
+
+## Definition of Done
+- With `OPENROUTER_API_KEY` set, `python kernel.py` starts and makes LLM calls successfully.
+- The LLM output is parsed into a JSON dict and validated.
+- Actions execute without runtime exceptions for missing fields.