|
| 1 | +# Code Review: Browser-Use Parity Phase 2 Implementation |
| 2 | + |
| 3 | +**Date:** 2026-01-23 |
| 4 | +**Reviewer:** Claude |
| 5 | +**Branch:** parity_win2 |
| 6 | +**Files Reviewed:** 4 modified + 7 new files |
| 7 | +**Scope:** Phase 2 deliverables from BU_GAP_DELIVERABLES.md |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Summary |
| 12 | + |
| 13 | +This PR implements all Phase 2 deliverables for Browser-Use parity: |
| 14 | + |
| 15 | +| Deliverable | Status | Notes | |
| 16 | +|-------------|--------|-------| |
| 17 | +| ToolRegistry + typed schemas | ✅ Complete | Pydantic-based, LLM-ready specs | |
| 18 | +| ToolContext/capability routing | ✅ Complete | `ctx.require()` pattern | |
| 19 | +| Tool execution tracing | ✅ Complete | `tool_call` events emitted | |
| 20 | +| Default tools registered | ✅ Complete | 7 core tools | |
| 21 | +| Filesystem tools (sandboxed) | ✅ Complete | read/write/append/replace | |
| 22 | +| Structured extraction | ✅ Complete | `extract(query, schema)` | |
| 23 | + |
| 24 | +Overall the implementation is well-designed and aligns with the design docs. A few issues need attention. |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## Design Doc Alignment |
| 29 | + |
| 30 | +### CAPABILITY_ROUTING_DESIGN.md Compliance |
| 31 | + |
| 32 | +| Requirement | Implementation | Status | |
| 33 | +|-------------|----------------|--------| |
| 34 | +| `BackendCapabilities` model | [context.py:14-19](sentience/tools/context.py#L14-L19) | ✅ | |
| 35 | +| `runtime.capabilities()` | [agent_runtime.py:378-389](sentience/agent_runtime.py#L378-L389) | ✅ | |
| 36 | +| `runtime.can(name)` | [agent_runtime.py:391-393](sentience/agent_runtime.py#L391-L393) | ✅ | |
| 37 | +| Structured "unsupported" errors | [context.py:43-45](sentience/tools/context.py#L43-L45) | ⚠️ Uses ValueError, not JSON | |
| 38 | + |
| 39 | +### JS_EVALUATE_TOOL_DESIGN.md Compliance |
| 40 | + |
| 41 | +The `evaluate_js` capability is flagged in `BackendCapabilities`, and the existing `runtime.evaluate_js()` from Phase 1 satisfies this requirement. The tool registry does **not** currently expose `evaluate_js` as a registered tool, which may be intentional. |
| 42 | + |
| 43 | +--- |
| 44 | + |
| 45 | +## Critical Issues |
| 46 | + |
| 47 | +### 1. `BackendCapabilities.downloads` and `filesystem_tools` missing |
| 48 | + |
| 49 | +**File:** [context.py:14-19](sentience/tools/context.py#L14-L19) |
| 50 | + |
| 51 | +The design doc specifies: |
| 52 | +```python |
| 53 | +class BackendCapabilities(BaseModel): |
| 54 | + tabs: bool = False |
| 55 | + evaluate_js: bool = True |
| 56 | + downloads: bool = True |
| 57 | + filesystem_tools: bool = False |
| 58 | +``` |
| 59 | + |
| 60 | +But the implementation only has: |
| 61 | +```python |
| 62 | +class BackendCapabilities(BaseModel): |
| 63 | + tabs: bool = False |
| 64 | + evaluate_js: bool = False |
| 65 | + keyboard: bool = False |
| 66 | +``` |
| 67 | + |
| 68 | +**Impact:** No way to check if filesystem tools are available before using them. |
| 69 | + |
| 70 | +**Fix:** Add missing capability flags per design doc. |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +### 2. `ToolContext.require()` raises ValueError, not structured error |
| 75 | + |
| 76 | +**File:** [context.py:43-45](sentience/tools/context.py#L43-L45) |
| 77 | + |
| 78 | +```python |
| 79 | +def require(self, name: str) -> None: |
| 80 | + if not self.can(name): |
| 81 | + raise ValueError(f"unsupported_capability: {name}") |
| 82 | +``` |
| 83 | + |
| 84 | +The design doc (CAPABILITY_ROUTING_DESIGN.md) specifies structured errors: |
| 85 | +```json |
| 86 | +{ "error": "unsupported_capability", "detail": "tabs not supported by backend" } |
| 87 | +``` |
| 88 | + |
| 89 | +**Impact:** LLM tool calls cannot distinguish unsupported capabilities from validation errors. |
| 90 | + |
| 91 | +**Fix:** Create a custom `UnsupportedCapabilityError` exception or return a structured result. |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +## Medium Issues |
| 96 | + |
| 97 | +### 3. Missing `evaluate_js` tool in default registry |
| 98 | + |
| 99 | +**File:** [defaults.py](sentience/tools/defaults.py) |
| 100 | + |
| 101 | +The registry registers 7 tools: `snapshot_state`, `click`, `type`, `scroll`, `scroll_to_element`, `click_rect`, `press`. However, `evaluate_js` is not registered despite being a key Phase 1/2 feature per the gap analysis. |
| 102 | + |
| 103 | +**Suggestion:** Add an `evaluate_js` tool that wraps `runtime.evaluate_js()`. |
| 104 | + |
| 105 | +--- |
| 106 | + |
| 107 | +### 4. `extract_async` does not use async LLM call |
| 108 | + |
| 109 | +**File:** [read.py:237-277](sentience/read.py#L237-L277) |
| 110 | + |
| 111 | +```python |
| 112 | +async def extract_async(...) -> ExtractResult: |
| 113 | + # ... |
| 114 | + response = llm.generate(system, user) # Sync call in async function |
| 115 | +``` |
| 116 | + |
| 117 | +The `LLMProvider.generate()` is synchronous, so `extract_async` doesn't provide true async benefits. |
| 118 | + |
| 119 | +**Suggestion:** Consider adding `generate_async()` to `LLMProvider` or document this limitation. |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +### 5. `FileSandbox` path validation may have edge cases |
| 124 | + |
| 125 | +**File:** [filesystem.py:18-22](sentience/tools/filesystem.py#L18-L22) |
| 126 | + |
| 127 | +```python |
| 128 | +def _resolve(self, path: str) -> Path: |
| 129 | + candidate = (self.base_dir / path).resolve() |
| 130 | + if not str(candidate).startswith(str(self.base_dir)): |
| 131 | + raise ValueError("path escapes sandbox root") |
| 132 | + return candidate |
| 133 | +``` |
| 134 | + |
| 135 | +The string prefix check could fail on edge cases like: |
| 136 | +- `base_dir = /tmp/sandbox` and path resolves to `/tmp/sandbox2/file` (passes check incorrectly) |
| 137 | + |
| 138 | +**Fix:** Use `candidate.is_relative_to(self.base_dir)` (Python 3.9+) instead of string prefix. |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +### 6. Duplicate capability logic in `AgentRuntime` and `ToolContext` |
| 143 | + |
| 144 | +**Files:** [agent_runtime.py:378-393](sentience/agent_runtime.py#L378-L393), [context.py:37-41](sentience/tools/context.py#L37-L41) |
| 145 | + |
| 146 | +Both classes implement `capabilities()` and `can()`. `ToolContext` delegates to `runtime`, which is correct, but the capability detection logic in `AgentRuntime.capabilities()` is complex: |
| 147 | + |
| 148 | +```python |
| 149 | +has_keyboard = hasattr(backend, "type_text") or bool( |
| 150 | + getattr(getattr(backend, "_page", None), "keyboard", None) |
| 151 | +) |
| 152 | +``` |
| 153 | + |
| 154 | +**Suggestion:** Consider moving capability detection to the backend protocol or making it more explicit. |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +## Minor Issues |
| 159 | + |
| 160 | +### 7. `read.py` inconsistent return types fixed |
| 161 | + |
| 162 | +The diff shows fixing inconsistent `dict` vs `ReadResult` returns: |
| 163 | +```python |
| 164 | +-return { |
| 165 | +- "status": "success", |
| 166 | +- ... |
| 167 | +-} |
| 168 | ++return ReadResult( |
| 169 | ++ status="success", |
| 170 | ++ ... |
| 171 | ++) |
| 172 | +``` |
| 173 | + |
| 174 | +This is a good fix. |
| 175 | + |
| 176 | +--- |
| 177 | + |
| 178 | +### 8. Missing docstrings in new modules |
| 179 | + |
| 180 | +**Files:** `context.py`, `defaults.py`, `filesystem.py` |
| 181 | + |
| 182 | +The `ToolContext` and `FileSandbox` classes lack detailed docstrings explaining their purpose and usage patterns. |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +### 9. `defaults.py` has unused `Snapshot` import warning potential |
| 187 | + |
| 188 | +**File:** [defaults.py:7](sentience/tools/defaults.py#L7) |
| 189 | + |
| 190 | +```python |
| 191 | +from ..models import ActionResult, BBox, Snapshot |
| 192 | +``` |
| 193 | + |
| 194 | +`Snapshot` is used as the output model for `snapshot_state`, so this is correct. No issue. |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## Test Coverage |
| 199 | + |
| 200 | +The tests are comprehensive and well-structured: |
| 201 | + |
| 202 | +| Test File | Coverage | |
| 203 | +|-----------|----------| |
| 204 | +| test_tool_registry.py | ✅ Registration, validation, LLM spec, execution, tracing, capability checks | |
| 205 | +| test_filesystem_tools.py | ✅ Sandbox traversal prevention, CRUD operations | |
| 206 | +| test_extract.py | ✅ Schema validation success/failure | |
| 207 | + |
| 208 | +### Missing Test Cases |
| 209 | + |
| 210 | +1. **`extract_async` test** - Only sync `extract` is tested |
| 211 | +2. **Edge case for sandbox path validation** - Test `/tmp/sandbox` vs `/tmp/sandbox2` scenario |
| 212 | +3. **`capabilities()` detection logic** - No test for `AgentRuntime.capabilities()` |
| 213 | + |
| 214 | +--- |
| 215 | + |
| 216 | +## Architecture Assessment |
| 217 | + |
| 218 | +The implementation follows a clean layered architecture: |
| 219 | + |
| 220 | +``` |
| 221 | +┌─────────────────────────────────────┐ |
| 222 | +│ ToolRegistry │ ← LLM-facing tool specs |
| 223 | +├─────────────────────────────────────┤ |
| 224 | +│ ToolContext │ ← Capability routing |
| 225 | +├─────────────────────────────────────┤ |
| 226 | +│ AgentRuntime │ ← Backend abstraction |
| 227 | +├─────────────────────────────────────┤ |
| 228 | +│ PlaywrightBackend / CDP │ ← Browser execution |
| 229 | +└─────────────────────────────────────┘ |
| 230 | +``` |
| 231 | + |
| 232 | +This aligns with the design goal: *"unified SDK surface through AgentRuntime regardless of backend"*. |
| 233 | + |
| 234 | +--- |
| 235 | + |
| 236 | +## Verdict |
| 237 | + |
| 238 | +**Approve with required changes:** |
| 239 | + |
| 240 | +1. **Must fix:** Add missing capability flags (`downloads`, `filesystem_tools`) per design doc |
| 241 | +2. **Must fix:** Fix `FileSandbox._resolve()` to use `is_relative_to()` instead of string prefix |
| 242 | +3. **Should fix:** Add `evaluate_js` tool to default registry |
| 243 | + |
| 244 | +The Phase 2 deliverables are functionally complete. The tool registry design is solid and the filesystem sandbox provides good security boundaries. |
| 245 | + |
| 246 | +--- |
| 247 | + |
| 248 | +## Appendix: Files Changed |
| 249 | + |
| 250 | +### Modified Files |
| 251 | + |
| 252 | +| File | Changes | |
| 253 | +|------|---------| |
| 254 | +| sentience/__init__.py | Export new tools module symbols | |
| 255 | +| sentience/agent_runtime.py | Add `tool_registry`, `capabilities()`, `can()` | |
| 256 | +| sentience/models.py | Add `ExtractResult` model | |
| 257 | +| sentience/read.py | Add `extract()` / `extract_async()`, fix return types | |
| 258 | + |
| 259 | +### New Files |
| 260 | + |
| 261 | +| File | Purpose | |
| 262 | +|------|---------| |
| 263 | +| sentience/tools/__init__.py | Package exports | |
| 264 | +| sentience/tools/registry.py | `ToolRegistry` and `ToolSpec` classes | |
| 265 | +| sentience/tools/context.py | `ToolContext` and `BackendCapabilities` | |
| 266 | +| sentience/tools/defaults.py | Default browser tool registrations | |
| 267 | +| sentience/tools/filesystem.py | Sandboxed filesystem tools | |
| 268 | +| tests/unit/test_tool_registry.py | Registry + execution tests | |
| 269 | +| tests/unit/test_filesystem_tools.py | Sandbox + CRUD tests | |
| 270 | +| tests/unit/test_extract.py | Extraction tests | |
0 commit comments