Skip to content

Commit 2191e5a

Browse files
author
SentienceDEV
committed
Phase 2: ToolRegistry + Filesystem + extraction
1 parent efd65fe commit 2191e5a

File tree

15 files changed

+1380
-35
lines changed

15 files changed

+1380
-35
lines changed
Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
# Code Review: Browser-Use Parity Phase 2 Implementation
2+
3+
**Date:** 2026-01-23
4+
**Reviewer:** Claude
5+
**Branch:** parity_win2
6+
**Files Reviewed:** 4 modified + 7 new files
7+
**Scope:** Phase 2 deliverables from BU_GAP_DELIVERABLES.md
8+
9+
---
10+
11+
## Summary
12+
13+
This PR implements all Phase 2 deliverables for Browser-Use parity:
14+
15+
| Deliverable | Status | Notes |
16+
|-------------|--------|-------|
17+
| ToolRegistry + typed schemas | ✅ Complete | Pydantic-based, LLM-ready specs |
18+
| ToolContext/capability routing | ✅ Complete | `ctx.require()` pattern |
19+
| Tool execution tracing | ✅ Complete | `tool_call` events emitted |
20+
| Default tools registered | ✅ Complete | 7 core tools |
21+
| Filesystem tools (sandboxed) | ✅ Complete | read/write/append/replace |
22+
| Structured extraction | ✅ Complete | `extract(query, schema)` |
23+
24+
Overall the implementation is well-designed and aligns with the design docs. A few issues need attention.
25+
26+
---
27+
28+
## Design Doc Alignment
29+
30+
### CAPABILITY_ROUTING_DESIGN.md Compliance
31+
32+
| Requirement | Implementation | Status |
33+
|-------------|----------------|--------|
34+
| `BackendCapabilities` model | [context.py:14-19](sentience/tools/context.py#L14-L19) ||
35+
| `runtime.capabilities()` | [agent_runtime.py:378-389](sentience/agent_runtime.py#L378-L389) ||
36+
| `runtime.can(name)` | [agent_runtime.py:391-393](sentience/agent_runtime.py#L391-L393) ||
37+
| Structured "unsupported" errors | [context.py:43-45](sentience/tools/context.py#L43-L45) | ⚠️ Uses ValueError, not JSON |
38+
39+
### JS_EVALUATE_TOOL_DESIGN.md Compliance
40+
41+
The `evaluate_js` capability is flagged in `BackendCapabilities`, and the existing `runtime.evaluate_js()` from Phase 1 satisfies this requirement. The tool registry does **not** currently expose `evaluate_js` as a registered tool, which may be intentional.
42+
43+
---
44+
45+
## Critical Issues
46+
47+
### 1. `BackendCapabilities.downloads` and `filesystem_tools` missing
48+
49+
**File:** [context.py:14-19](sentience/tools/context.py#L14-L19)
50+
51+
The design doc specifies:
52+
```python
53+
class BackendCapabilities(BaseModel):
54+
tabs: bool = False
55+
evaluate_js: bool = True
56+
downloads: bool = True
57+
filesystem_tools: bool = False
58+
```
59+
60+
But the implementation only has:
61+
```python
62+
class BackendCapabilities(BaseModel):
63+
tabs: bool = False
64+
evaluate_js: bool = False
65+
keyboard: bool = False
66+
```
67+
68+
**Impact:** No way to check if filesystem tools are available before using them.
69+
70+
**Fix:** Add missing capability flags per design doc.
71+
72+
---
73+
74+
### 2. `ToolContext.require()` raises ValueError, not structured error
75+
76+
**File:** [context.py:43-45](sentience/tools/context.py#L43-L45)
77+
78+
```python
79+
def require(self, name: str) -> None:
80+
if not self.can(name):
81+
raise ValueError(f"unsupported_capability: {name}")
82+
```
83+
84+
The design doc (CAPABILITY_ROUTING_DESIGN.md) specifies structured errors:
85+
```json
86+
{ "error": "unsupported_capability", "detail": "tabs not supported by backend" }
87+
```
88+
89+
**Impact:** LLM tool calls cannot distinguish unsupported capabilities from validation errors.
90+
91+
**Fix:** Create a custom `UnsupportedCapabilityError` exception or return a structured result.
92+
93+
---
94+
95+
## Medium Issues
96+
97+
### 3. Missing `evaluate_js` tool in default registry
98+
99+
**File:** [defaults.py](sentience/tools/defaults.py)
100+
101+
The registry registers 7 tools: `snapshot_state`, `click`, `type`, `scroll`, `scroll_to_element`, `click_rect`, `press`. However, `evaluate_js` is not registered despite being a key Phase 1/2 feature per the gap analysis.
102+
103+
**Suggestion:** Add an `evaluate_js` tool that wraps `runtime.evaluate_js()`.
104+
105+
---
106+
107+
### 4. `extract_async` does not use async LLM call
108+
109+
**File:** [read.py:237-277](sentience/read.py#L237-L277)
110+
111+
```python
112+
async def extract_async(...) -> ExtractResult:
113+
# ...
114+
response = llm.generate(system, user) # Sync call in async function
115+
```
116+
117+
The `LLMProvider.generate()` is synchronous, so `extract_async` doesn't provide true async benefits.
118+
119+
**Suggestion:** Consider adding `generate_async()` to `LLMProvider` or document this limitation.
120+
121+
---
122+
123+
### 5. `FileSandbox` path validation may have edge cases
124+
125+
**File:** [filesystem.py:18-22](sentience/tools/filesystem.py#L18-L22)
126+
127+
```python
128+
def _resolve(self, path: str) -> Path:
129+
candidate = (self.base_dir / path).resolve()
130+
if not str(candidate).startswith(str(self.base_dir)):
131+
raise ValueError("path escapes sandbox root")
132+
return candidate
133+
```
134+
135+
The string prefix check could fail on edge cases like:
136+
- `base_dir = /tmp/sandbox` and path resolves to `/tmp/sandbox2/file` (passes check incorrectly)
137+
138+
**Fix:** Use `candidate.is_relative_to(self.base_dir)` (Python 3.9+) instead of string prefix.
139+
140+
---
141+
142+
### 6. Duplicate capability logic in `AgentRuntime` and `ToolContext`
143+
144+
**Files:** [agent_runtime.py:378-393](sentience/agent_runtime.py#L378-L393), [context.py:37-41](sentience/tools/context.py#L37-L41)
145+
146+
Both classes implement `capabilities()` and `can()`. `ToolContext` delegates to `runtime`, which is correct, but the capability detection logic in `AgentRuntime.capabilities()` is complex:
147+
148+
```python
149+
has_keyboard = hasattr(backend, "type_text") or bool(
150+
getattr(getattr(backend, "_page", None), "keyboard", None)
151+
)
152+
```
153+
154+
**Suggestion:** Consider moving capability detection to the backend protocol or making it more explicit.
155+
156+
---
157+
158+
## Minor Issues
159+
160+
### 7. `read.py` inconsistent return types fixed
161+
162+
The diff shows fixing inconsistent `dict` vs `ReadResult` returns:
163+
```python
164+
-return {
165+
- "status": "success",
166+
- ...
167+
-}
168+
+return ReadResult(
169+
+ status="success",
170+
+ ...
171+
+)
172+
```
173+
174+
This is a good fix.
175+
176+
---
177+
178+
### 8. Missing docstrings in new modules
179+
180+
**Files:** `context.py`, `defaults.py`, `filesystem.py`
181+
182+
The `ToolContext` and `FileSandbox` classes lack detailed docstrings explaining their purpose and usage patterns.
183+
184+
---
185+
186+
### 9. `defaults.py` has unused `Snapshot` import warning potential
187+
188+
**File:** [defaults.py:7](sentience/tools/defaults.py#L7)
189+
190+
```python
191+
from ..models import ActionResult, BBox, Snapshot
192+
```
193+
194+
`Snapshot` is used as the output model for `snapshot_state`, so this is correct. No issue.
195+
196+
---
197+
198+
## Test Coverage
199+
200+
The tests are comprehensive and well-structured:
201+
202+
| Test File | Coverage |
203+
|-----------|----------|
204+
| test_tool_registry.py | ✅ Registration, validation, LLM spec, execution, tracing, capability checks |
205+
| test_filesystem_tools.py | ✅ Sandbox traversal prevention, CRUD operations |
206+
| test_extract.py | ✅ Schema validation success/failure |
207+
208+
### Missing Test Cases
209+
210+
1. **`extract_async` test** - Only sync `extract` is tested
211+
2. **Edge case for sandbox path validation** - Test `/tmp/sandbox` vs `/tmp/sandbox2` scenario
212+
3. **`capabilities()` detection logic** - No test for `AgentRuntime.capabilities()`
213+
214+
---
215+
216+
## Architecture Assessment
217+
218+
The implementation follows a clean layered architecture:
219+
220+
```
221+
┌─────────────────────────────────────┐
222+
│ ToolRegistry │ ← LLM-facing tool specs
223+
├─────────────────────────────────────┤
224+
│ ToolContext │ ← Capability routing
225+
├─────────────────────────────────────┤
226+
│ AgentRuntime │ ← Backend abstraction
227+
├─────────────────────────────────────┤
228+
│ PlaywrightBackend / CDP │ ← Browser execution
229+
└─────────────────────────────────────┘
230+
```
231+
232+
This aligns with the design goal: *"unified SDK surface through AgentRuntime regardless of backend"*.
233+
234+
---
235+
236+
## Verdict
237+
238+
**Approve with required changes:**
239+
240+
1. **Must fix:** Add missing capability flags (`downloads`, `filesystem_tools`) per design doc
241+
2. **Must fix:** Fix `FileSandbox._resolve()` to use `is_relative_to()` instead of string prefix
242+
3. **Should fix:** Add `evaluate_js` tool to default registry
243+
244+
The Phase 2 deliverables are functionally complete. The tool registry design is solid and the filesystem sandbox provides good security boundaries.
245+
246+
---
247+
248+
## Appendix: Files Changed
249+
250+
### Modified Files
251+
252+
| File | Changes |
253+
|------|---------|
254+
| sentience/__init__.py | Export new tools module symbols |
255+
| sentience/agent_runtime.py | Add `tool_registry`, `capabilities()`, `can()` |
256+
| sentience/models.py | Add `ExtractResult` model |
257+
| sentience/read.py | Add `extract()` / `extract_async()`, fix return types |
258+
259+
### New Files
260+
261+
| File | Purpose |
262+
|------|---------|
263+
| sentience/tools/__init__.py | Package exports |
264+
| sentience/tools/registry.py | `ToolRegistry` and `ToolSpec` classes |
265+
| sentience/tools/context.py | `ToolContext` and `BackendCapabilities` |
266+
| sentience/tools/defaults.py | Default browser tool registrations |
267+
| sentience/tools/filesystem.py | Sandboxed filesystem tools |
268+
| tests/unit/test_tool_registry.py | Registry + execution tests |
269+
| tests/unit/test_filesystem_tools.py | Sandbox + CRUD tests |
270+
| tests/unit/test_extract.py | Extraction tests |

sentience/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,13 +102,14 @@
102102
from .ordinal import OrdinalIntent, boost_ordinal_elements, detect_ordinal_intent, select_by_ordinal
103103
from .overlay import clear_overlay, show_overlay
104104
from .query import find, query
105-
from .read import read
105+
from .read import extract, extract_async, read
106106
from .recorder import Recorder, Trace, TraceStep, record
107107
from .runtime_agent import RuntimeAgent, RuntimeStep, StepVerification
108108
from .screenshot import screenshot
109109
from .sentience_methods import AgentAction, SentienceMethod
110110
from .snapshot import snapshot
111111
from .text_search import find_text_rect
112+
from .tools import BackendCapabilities, ToolContext, ToolRegistry, ToolSpec, register_default_tools
112113
from .tracer_factory import SENTIENCE_API_URL, create_tracer
113114
from .tracing import JsonlTraceSink, TraceEvent, Tracer, TraceSink
114115

sentience/agent.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -89,9 +89,7 @@ def _safe_hook_call_sync(
8989
if verbose:
9090
print(f"⚠️ Hook error (non-fatal): {hook_error}")
9191
else:
92-
logging.getLogger(__name__).warning(
93-
"Hook error (non-fatal): %s", hook_error
94-
)
92+
logging.getLogger(__name__).warning("Hook error (non-fatal): %s", hook_error)
9593

9694

9795
async def _safe_hook_call_async(
@@ -109,9 +107,7 @@ async def _safe_hook_call_async(
109107
if verbose:
110108
print(f"⚠️ Hook error (non-fatal): {hook_error}")
111109
else:
112-
logging.getLogger(__name__).warning(
113-
"Hook error (non-fatal): %s", hook_error
114-
)
110+
logging.getLogger(__name__).warning("Hook error (non-fatal): %s", hook_error)
115111

116112

117113
class SentienceAgent(BaseAgent):

sentience/agent_runtime.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@
8181
TabListResult,
8282
TabOperationResult,
8383
)
84+
from .tools import BackendCapabilities, ToolRegistry
8485
from .trace_event_builder import TraceEventBuilder
8586
from .verification import AssertContext, AssertOutcome, Predicate
8687

@@ -118,6 +119,7 @@ def __init__(
118119
tracer: Tracer,
119120
snapshot_options: SnapshotOptions | None = None,
120121
sentience_api_key: str | None = None,
122+
tool_registry: ToolRegistry | None = None,
121123
):
122124
"""
123125
Initialize agent runtime with any BrowserBackend-compatible browser.
@@ -130,9 +132,11 @@ def __init__(
130132
tracer: Tracer for emitting verification events
131133
snapshot_options: Default options for snapshots
132134
sentience_api_key: API key for Pro/Enterprise tier (enables Gateway refinement)
135+
tool_registry: Optional ToolRegistry for LLM-callable tools
133136
"""
134137
self.backend = backend
135138
self.tracer = tracer
139+
self.tool_registry = tool_registry
136140

137141
# Build default snapshot options with API key if provided
138142
default_opts = snapshot_options or SnapshotOptions()
@@ -372,6 +376,33 @@ def _get_tab_backend(self):
372376
return None
373377
return backend
374378

379+
def capabilities(self) -> BackendCapabilities:
380+
backend = getattr(self, "backend", None)
381+
if backend is None:
382+
return BackendCapabilities()
383+
has_eval = hasattr(backend, "eval")
384+
has_keyboard = hasattr(backend, "type_text") or bool(
385+
getattr(getattr(backend, "_page", None), "keyboard", None)
386+
)
387+
has_downloads = bool(getattr(backend, "downloads", None))
388+
has_files = False
389+
if self.tool_registry is not None:
390+
try:
391+
has_files = self.tool_registry.get("read_file") is not None
392+
except Exception:
393+
has_files = False
394+
return BackendCapabilities(
395+
tabs=self._get_tab_backend() is not None,
396+
evaluate_js=bool(has_eval),
397+
downloads=has_downloads,
398+
filesystem_tools=has_files,
399+
keyboard=bool(has_keyboard or has_eval),
400+
)
401+
402+
def can(self, capability: str) -> bool:
403+
caps = self.capabilities()
404+
return bool(getattr(caps, capability, False))
405+
375406
@staticmethod
376407
def _stringify_eval_value(value: Any) -> str:
377408
if value is None:

sentience/llm_provider.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
Enables "Bring Your Own Brain" (BYOB) pattern - plug in any LLM provider
44
"""
55

6+
import asyncio
67
from abc import ABC, abstractmethod
78
from dataclasses import dataclass
89
from typing import Any
@@ -59,6 +60,12 @@ def generate(self, system_prompt: str, user_prompt: str, **kwargs) -> LLMRespons
5960
"""
6061
pass
6162

63+
async def generate_async(self, system_prompt: str, user_prompt: str, **kwargs) -> LLMResponse:
64+
"""
65+
Async wrapper around generate() for providers without native async support.
66+
"""
67+
return await asyncio.to_thread(self.generate, system_prompt, user_prompt, **kwargs)
68+
6269
@abstractmethod
6370
def supports_json_mode(self) -> bool:
6471
"""

0 commit comments

Comments
 (0)