Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions openspec/changes/add-opencode-log-streaming/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
## Context
AgentV currently supports several “agentic” providers (e.g. Codex, Pi coding agent, VS Code subagent) that can execute multi-step work with tool calls.

OpenCode is an agent runtime that exposes a local HTTP API plus Server-Sent Events (SSE) for streaming events. It also has a first-party TypeScript SDK (`@opencode-ai/sdk`) that can spawn a local `opencode serve` process and provides typed client methods.

This change expands the existing OpenSpec proposal from “OpenCode stream logs” to a full OpenCode provider integration for AgentV.

## Goals / Non-Goals

Goals:
- Add a new AgentV provider kind: `opencode`.
- Support running OpenCode against a per-eval-case working directory (AgentV temp workspace) so the agent can read/write files.
- Produce a `ProviderResponse.outputMessages` trace that captures:
- The final assistant message text.
- Tool calls (name + input + output) in a deterministic shape suitable for trace-based evaluators like `tool_trajectory`.
- Provide optional per-run streaming log artifacts on disk and publish log paths so the CLI can show them early (Codex/Pi pattern).

Non-Goals:
- Full UI/interactive experiences (OpenCode TUI, rich streaming token output in AgentV terminal).
- Implementing every OpenCode event type as a first-class AgentV trace event.
- Distributed / remote OpenCode deployments that require auth beyond local process execution.

## Key Decisions

- **Use OpenCode’s first-party SDK v2** (`@opencode-ai/sdk/v2`) rather than implementing a custom HTTP + SSE client.
- Rationale: typed API surface, server lifecycle helper, fewer protocol footguns.

- **Primary completion signal:** use `client.session.prompt(...)` to run the request and treat its response as authoritative for the final assistant message and parts.
- Streaming SSE is used for logs and (optionally) richer incremental trace capture.

- **Working directory isolation:** execute each eval case attempt in its own filesystem directory (AgentV temp workspace). The OpenCode client MUST include the directory context so OpenCode operates within that directory.
- Rationale: reproducibility, parallelism, and preventing cross-contamination between eval cases.

## Provider Lifecycle

### Initialization
- Resolve target settings (binary/executable path, server config, model selection, permissions behavior, logging options).
- Start a local OpenCode server if no `baseUrl` is configured.
- Prefer a per-process server instance (shared by provider invocations) to reduce spawn overhead.
- The provider MUST avoid port collisions under parallel workers (either choose an ephemeral port, or allocate from a safe range).

### Per-eval invocation
For each `ProviderRequest`:
1. Create/resolve the eval-case work directory (temp workspace).
2. Create or reuse an OpenCode `sessionID` scoped to that directory.
3. If streaming logs enabled, open the stream log file and subscribe to `client.event.subscribe({ directory })` and write JSONL.
4. Send the prompt using `client.session.prompt({ sessionID, directory, system, parts, model?, agent?, tools? })`.
5. Build `ProviderResponse` from the returned `parts` (and optionally from gathered SSE events).
6. Tear down the SSE subscription for this invocation; keep the server alive for other requests.

### Shutdown
- Ensure spawned server processes are terminated on completion or abort.

## Prompt & Message Mapping

### Inputs
AgentV provides:
- `question` (formatted question string)
- optional `systemPrompt`
- optional `guidelines` (unwrapped content for non-agent providers)
- optional `guideline_files` / `input_files` (paths, often represented as `file://` links for agent providers)
- optional `chatPrompt` (multi-message)

Mapping approach:
- Prefer using `chatPrompt` when present.
- Convert AgentV roles into OpenCode `system` + `parts`.
- Include the final user query as a `text` part.
- For filesystem-capable agent providers (including OpenCode), prefer referencing guideline and attachment files as file links rather than embedding large inline content.

### Outputs
OpenCode returns an assistant message with `parts` including:
- `text` (assistant text)
- `reasoning` (optional)
- `tool` parts with `callID`, `tool`, and `state` (pending/running/completed/error)

AgentV output mapping:
- Construct a single final `OutputMessage` with:
- `role: "assistant"`
- `content: <concatenated assistant text parts>`
- `toolCalls: ToolCall[]` derived from `tool` parts:
- `id` = OpenCode `callID`
- `tool` = OpenCode `tool`
- `input` = `state.input` when present
- `output` = `state.output` when present (for completed)

Optionally (future): emit separate `OutputMessage` entries for tool results, reasoning, or step boundaries. This is not required for initial tool-trajectory support.

## Streaming Logs

### Log content
- Default format: JSONL where each line is a single OpenCode SSE event object.
- MAY additionally include human-readable “summary” lines, but JSON objects MUST be preserved to keep tooling stable.

### Log path publication
- When the provider selects a log file path, it publishes `{ filePath, targetName, evalCaseId?, attempt? }` to a process-local tracker.

## Permissions

OpenCode can emit `permission.asked` events (e.g., filesystem writes, command execution).

Initial policy:
- Provide a target option to auto-approve permissions (`once` or `always`) or reject.
- Default SHOULD be conservative (reject) unless explicitly enabled.

## Risks / Trade-offs
- **Port management / concurrency:** shared server improves performance but requires careful port selection and isolation.
- **Trace fidelity:** relying on final `parts` is deterministic but may omit some intermediate streaming deltas.
- **Permission behavior:** auto-approval increases convenience but raises safety risk; default should remain restrictive.

## Migration Plan
- Non-breaking addition: new provider kind and target schema fields are additive.
- Existing targets remain valid.

## Open Questions
- Should AgentV support connecting to an externally-running OpenCode server (`baseUrl`) in addition to spawning a local server?
- Should OpenCode be treated as an `AGENT_PROVIDER_KIND` (filesystem access) by default?
- Which OpenCode “tools” should be enabled/disabled by default when running evals?
38 changes: 38 additions & 0 deletions openspec/changes/add-opencode-log-streaming/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Change: Add OpenCode provider support (with stream log artifacts)

## Why
AgentV currently supports agentic providers that operate over a filesystem and emit tool calls (e.g. Codex, Pi coding agent, VS Code subagent). OpenCode is another popular agent runtime with a well-defined event model (SSE) and structured tool lifecycle.

To evaluate agentic behavior (especially with deterministic evaluators like `tool_trajectory`) AgentV needs:
- A first-class `opencode` provider kind that can run OpenCode in an isolated per-eval workspace.
- A stable mapping from OpenCode tool parts into AgentV `outputMessages/toolCalls`.
- Debug visibility during execution, ideally via per-run stream logs that the CLI can surface early (Codex/Pi pattern).

## What Changes
- Add a new provider kind: `opencode`.
- Define required/optional target configuration for OpenCode in `targets.yaml`.
- Define how the OpenCode provider constructs prompts (system + parts) and executes within a per-eval-case work directory.
- Define the mapping from OpenCode message parts (especially `tool` parts) into AgentV `ProviderResponse.outputMessages` and `ToolCall` fields.
- Add a standard mechanism for an OpenCode provider to write per-run “stream logs” to disk (under `.agentv/logs/opencode/` by default).
- Add a lightweight “log tracker” so the `agentv eval` CLI can surface OpenCode log file paths immediately (same pattern as Codex/Pi).
- Define the expected log content at a high level (raw event JSONL is the default) so tooling remains stable even if OpenCode’s internal event structure evolves.

## Non-Goals
- Rich streaming UX in the AgentV terminal (token-by-token output).
- OpenCode TUI integration.
- Advanced OpenCode orchestration features beyond single-request evaluation (e.g., long-lived interactive sessions shared across evalcases).
- New CLI flags or UI features beyond listing log file paths.

## Impact
- Affected specs:
- `evaluation` (OpenCode provider behavior, output mapping, and logging expectations)
- `validation` (targets schema updates for `provider: opencode`)
- `eval-cli` (CLI surfacing of provider log file paths)
- Affected code (planned follow-up implementation):
- Core: OpenCode provider implementation, target schema updates, log tracker + exports
- CLI: subscribe and display OpenCode log paths

## Compatibility
- Non-breaking. Existing targets and providers are unaffected.
- Logging remains optional (providers may omit log streaming when disabled or when directories cannot be created).

14 changes: 14 additions & 0 deletions openspec/changes/add-opencode-log-streaming/specs/eval-cli/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## ADDED Requirements

### Requirement: Surface OpenCode provider log paths

The CLI SHALL surface OpenCode provider log paths when they become available.

#### Scenario: Print OpenCode log path when discovered
- **WHEN** an OpenCode provider publishes a new log entry `{ filePath, targetName, evalCaseId?, attempt? }`
- **THEN** the CLI prints the log file path in a dedicated “OpenCode logs” section
- **AND** does not print duplicate log paths more than once

#### Scenario: Continue printing progress while logs are emitted
- **WHEN** OpenCode logs are printed while eval cases are running
- **THEN** the CLI continues to print per-eval progress lines without requiring interactive cursor control
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
## ADDED Requirements

### Requirement: OpenCode provider execution

The system SHALL support an OpenCode-backed provider when a target is configured with `provider: opencode`.

#### Scenario: Execute an eval case with OpenCode
- **WHEN** a target is configured with `provider: opencode`
- **AND** an eval case is executed for that target
- **THEN** the system invokes OpenCode to generate an assistant response
- **AND** runs OpenCode within an isolated per-eval-case working directory
- **AND** returns a `ProviderResponse` with `outputMessages` populated

#### Scenario: Provider fails cleanly when OpenCode is unavailable
- **WHEN** an OpenCode target is selected
- **AND** the OpenCode runtime cannot be started or reached (missing executable, failed server startup, unreachable base URL)
- **THEN** the eval case attempt fails with an actionable error message
- **AND** other eval cases continue when running in parallel

### Requirement: OpenCode provider log streaming artifacts

When an OpenCode-based provider run is executed, the system SHALL support writing a per-run stream log file and surfacing its path for debugging.

#### Scenario: Provider creates an OpenCode stream log file
- **WHEN** a provider run begins for an OpenCode-backed target
- **THEN** the provider writes a log file under `.agentv/logs/opencode/` by default (or a configured override)
- **AND** the provider appends progress entries as the agent executes

#### Scenario: Provider disables OpenCode stream logging
- **WHEN** OpenCode stream logging is disabled via configuration or environment
- **THEN** the provider does not create a log file
- **AND** evaluation continues normally

#### Scenario: Provider cannot create the OpenCode log directory
- **WHEN** the provider cannot create the configured log directory (permissions, invalid path)
- **THEN** the provider continues without stream logs
- **AND** emits a warning in verbose mode only

### Requirement: OpenCode log path publication

The system SHALL provide a mechanism to publish OpenCode log file paths so the CLI can present them to the user as soon as they are created.

#### Scenario: Publish OpenCode log path at run start
- **WHEN** the provider decides on a log file path for an OpenCode run
- **THEN** it publishes `{ filePath, targetName, evalCaseId?, attempt? }` to a process-local log tracker
- **AND** downstream consumers MAY subscribe to this tracker to display the log path

### Requirement: OpenCode tool-call trace mapping

The OpenCode provider SHALL map OpenCode tool lifecycle parts into AgentV tool calls so deterministic evaluators can operate on the trace.

#### Scenario: Tool parts become toolCalls
- **WHEN** OpenCode returns a response containing one or more `tool` parts
- **THEN** the provider emits `ProviderResponse.outputMessages` containing `toolCalls`
- **AND** each tool call includes `tool` name and `input` arguments when available
- **AND** completed tool calls include `output` when available
- **AND** tool call identifiers are stable across retries within an attempt when OpenCode provides them

#### Scenario: Tool error parts are surfaced
- **WHEN** OpenCode returns a `tool` part with error state
- **THEN** the provider includes the tool call in `toolCalls`
- **AND** includes the error information in a provider-specific metadata field or in `output` with a structured error payload

### Requirement: OpenCode permission handling

The OpenCode provider SHALL handle OpenCode permission requests deterministically based on target configuration.

#### Scenario: Default permission policy is conservative
- **WHEN** OpenCode emits a permission request during an eval case
- **AND** the target does not explicitly enable auto-approval
- **THEN** the provider rejects the request
- **AND** the eval attempt fails with a clear error describing the blocked permission

#### Scenario: Auto-approve permissions when configured
- **WHEN** OpenCode emits a permission request during an eval case
- **AND** the target is configured to auto-approve permissions
- **THEN** the provider approves the request according to the configured policy (e.g., once or always)
- **AND** execution continues normally
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
## ADDED Requirements

### Requirement: Validate OpenCode targets

The system SHALL validate OpenCode provider targets in `targets.yaml` using Zod schemas, rejecting unknown properties and accepting both snake_case and camelCase forms.

#### Scenario: Accept a valid OpenCode target
- **WHEN** a targets file contains a target with `provider: opencode`
- **THEN** the configuration is accepted
- **AND** the resolved config normalizes to camelCase

#### Scenario: Reject unknown OpenCode target properties
- **WHEN** an OpenCode target contains an unrecognized property (e.g., `streamlog_dir` instead of `stream_log_dir`)
- **THEN** validation fails with an error identifying the unknown property path

#### Scenario: Accept snake_case and camelCase equivalence for OpenCode settings
- **WHEN** an OpenCode target uses `stream_logs` (snake_case)
- **OR** uses `streamLogs` (camelCase)
- **THEN** both are accepted as equivalent
- **AND** the resolved config normalizes to `streamLogs`
18 changes: 18 additions & 0 deletions openspec/changes/add-opencode-log-streaming/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## 1. Implementation
- [ ] 1.1 Add new provider kind `opencode` (core provider registry + aliases)
- [ ] 1.2 Extend targets schema to support `provider: opencode` and validate settings
- [ ] 1.3 Implement OpenCode provider invocation (server lifecycle, per-eval-case directory, prompt execution)
- [ ] 1.4 Map OpenCode `tool` parts into AgentV `outputMessages/toolCalls` for trace-based evaluators
- [ ] 1.5 Add OpenCode stream log writer (JSONL) and log path tracker (record/consume/subscribe)
- [ ] 1.6 Export OpenCode log tracker functions from provider index
- [ ] 1.7 Update `agentv eval` CLI to subscribe and print OpenCode log paths (no duplicates)

## 2. Validation
- [ ] 2.1 Run `openspec validate add-opencode-log-streaming --strict`
- [ ] 2.2 Add/update unit tests for:
- [ ] targets schema parsing for `opencode` targets
- [ ] tool-call mapping from OpenCode parts → AgentV `ToolCall`
- [ ] log tracker dedupe behavior (CLI subscriber)

## 3. Documentation
- [ ] 3.1 Update any relevant skill/docs (if the project uses them for provider setup)