From aec33be4248be211dee0a5e42d266c7a4b6618e9 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Fri, 9 Jan 2026 13:02:48 +1100 Subject: [PATCH 1/2] feat(openspec): proposal for opencode stream logs --- .../add-opencode-log-streaming/proposal.md | 25 ++++++++++++++++ .../specs/eval-cli/spec.md | 14 +++++++++ .../specs/evaluation/spec.md | 29 +++++++++++++++++++ .../add-opencode-log-streaming/tasks.md | 13 +++++++++ 4 files changed, 81 insertions(+) create mode 100644 openspec/changes/add-opencode-log-streaming/proposal.md create mode 100644 openspec/changes/add-opencode-log-streaming/specs/eval-cli/spec.md create mode 100644 openspec/changes/add-opencode-log-streaming/specs/evaluation/spec.md create mode 100644 openspec/changes/add-opencode-log-streaming/tasks.md diff --git a/openspec/changes/add-opencode-log-streaming/proposal.md b/openspec/changes/add-opencode-log-streaming/proposal.md new file mode 100644 index 0000000..6bfb6d0 --- /dev/null +++ b/openspec/changes/add-opencode-log-streaming/proposal.md @@ -0,0 +1,25 @@ +# Change: Add OpenCode log streaming artifacts (Codex-style) + +## Why +When evaluating agentic providers, users need visibility into what the agent is doing while the run is in-progress. AgentV currently exposes this for some agent CLIs (e.g. Codex/Pi) by writing a per-run log file and printing its path, but OpenCode support has no equivalent yet. + +## What Changes +- Add a standard mechanism for an OpenCode provider to write per-run “stream logs” to disk (under `.agentv/logs/opencode/` by default). +- Add a lightweight “log tracker” so the `agentv eval` CLI can surface OpenCode log file paths immediately (same pattern as Codex/Pi). +- Define the expected log content at a high level (raw event JSONL or summarized lines) so tooling remains stable even if OpenCode’s internal event structure evolves. + +## Non-Goals +- Implement full OpenCode provider execution in this change (this proposal only establishes logging + CLI surfacing conventions). +- Add new CLI flags or UI features beyond listing log file paths. + +## Impact +- Affected specs: + - `evaluation` (provider integration expectations for OpenCode logging) + - `eval-cli` (CLI surfacing of provider log file paths) +- Affected code (planned follow-up implementation): + - Core: provider log tracker + exports + - CLI: subscribe and display OpenCode log paths + +## Compatibility +- Non-breaking. Existing targets and providers are unaffected. +- Logging remains optional (providers may omit log streaming when disabled or when directories cannot be created). diff --git a/openspec/changes/add-opencode-log-streaming/specs/eval-cli/spec.md b/openspec/changes/add-opencode-log-streaming/specs/eval-cli/spec.md new file mode 100644 index 0000000..c3c716d --- /dev/null +++ b/openspec/changes/add-opencode-log-streaming/specs/eval-cli/spec.md @@ -0,0 +1,14 @@ +## ADDED Requirements + +### Requirement: Surface OpenCode provider log paths + +The CLI SHALL surface OpenCode provider log paths when they become available. + +#### Scenario: Print OpenCode log path when discovered +- **WHEN** an OpenCode provider publishes a new log entry `{ filePath, targetName, evalCaseId?, attempt? }` +- **THEN** the CLI prints the log file path in a dedicated “OpenCode logs” section +- **AND** does not print duplicate log paths more than once + +#### Scenario: Continue printing progress while logs are emitted +- **WHEN** OpenCode logs are printed while eval cases are running +- **THEN** the CLI continues to print per-eval progress lines without requiring interactive cursor control diff --git a/openspec/changes/add-opencode-log-streaming/specs/evaluation/spec.md b/openspec/changes/add-opencode-log-streaming/specs/evaluation/spec.md new file mode 100644 index 0000000..7f60b07 --- /dev/null +++ b/openspec/changes/add-opencode-log-streaming/specs/evaluation/spec.md @@ -0,0 +1,29 @@ +## ADDED Requirements + +### Requirement: OpenCode provider log streaming artifacts + +When an OpenCode-based provider run is executed, the system SHALL support writing a per-run stream log file and surfacing its path for debugging. + +#### Scenario: Provider creates an OpenCode stream log file +- **WHEN** a provider run begins for an OpenCode-backed target +- **THEN** the provider writes a log file under `.agentv/logs/opencode/` by default (or a configured override) +- **AND** the provider appends progress entries as the agent executes + +#### Scenario: Provider disables OpenCode stream logging +- **WHEN** OpenCode stream logging is disabled via configuration or environment +- **THEN** the provider does not create a log file +- **AND** evaluation continues normally + +#### Scenario: Provider cannot create the OpenCode log directory +- **WHEN** the provider cannot create the configured log directory (permissions, invalid path) +- **THEN** the provider continues without stream logs +- **AND** emits a warning in verbose mode only + +### Requirement: OpenCode log path publication + +The system SHALL provide a mechanism to publish OpenCode log file paths so the CLI can present them to the user as soon as they are created. + +#### Scenario: Publish OpenCode log path at run start +- **WHEN** the provider decides on a log file path for an OpenCode run +- **THEN** it publishes `{ filePath, targetName, evalCaseId?, attempt? }` to a process-local log tracker +- **AND** downstream consumers MAY subscribe to this tracker to display the log path diff --git a/openspec/changes/add-opencode-log-streaming/tasks.md b/openspec/changes/add-opencode-log-streaming/tasks.md new file mode 100644 index 0000000..8b8e588 --- /dev/null +++ b/openspec/changes/add-opencode-log-streaming/tasks.md @@ -0,0 +1,13 @@ +## 1. Implementation +- [ ] 1.1 Add `opencode-log-tracker` module (record/consume/subscribe) +- [ ] 1.2 Export OpenCode log tracker functions from provider index +- [ ] 1.3 Update `agentv eval` CLI to subscribe and print OpenCode log paths +- [ ] 1.4 Update progress display labels to include OpenCode +- [ ] 1.5 Wire OpenCode provider (future PR) to call `recordOpencodeLogEntry()` and write log lines + +## 2. Validation +- [ ] 2.1 Run `openspec validate add-opencode-log-streaming --strict` +- [ ] 2.2 Add/update unit tests if new runtime logic is introduced + +## 3. Documentation +- [ ] 3.1 Update any relevant skill/docs (if the project uses them for provider setup) From 01baa8bea1566531758eb6b1e1f0783e4b2514a5 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Fri, 9 Jan 2026 13:32:40 +1100 Subject: [PATCH 2/2] feat(openspec): expand opencode change to provider --- .../add-opencode-log-streaming/design.md | 117 ++++++++++++++++++ .../add-opencode-log-streaming/proposal.md | 27 ++-- .../specs/evaluation/spec.md | 49 ++++++++ .../specs/validation/spec.md | 20 +++ .../add-opencode-log-streaming/tasks.md | 17 ++- 5 files changed, 217 insertions(+), 13 deletions(-) create mode 100644 openspec/changes/add-opencode-log-streaming/design.md create mode 100644 openspec/changes/add-opencode-log-streaming/specs/validation/spec.md diff --git a/openspec/changes/add-opencode-log-streaming/design.md b/openspec/changes/add-opencode-log-streaming/design.md new file mode 100644 index 0000000..78ee812 --- /dev/null +++ b/openspec/changes/add-opencode-log-streaming/design.md @@ -0,0 +1,117 @@ +## Context +AgentV currently supports several “agentic” providers (e.g. Codex, Pi coding agent, VS Code subagent) that can execute multi-step work with tool calls. + +OpenCode is an agent runtime that exposes a local HTTP API plus Server-Sent Events (SSE) for streaming events. It also has a first-party TypeScript SDK (`@opencode-ai/sdk`) that can spawn a local `opencode serve` process and provides typed client methods. + +This change expands the existing OpenSpec proposal from “OpenCode stream logs” to a full OpenCode provider integration for AgentV. + +## Goals / Non-Goals + +Goals: +- Add a new AgentV provider kind: `opencode`. +- Support running OpenCode against a per-eval-case working directory (AgentV temp workspace) so the agent can read/write files. +- Produce a `ProviderResponse.outputMessages` trace that captures: + - The final assistant message text. + - Tool calls (name + input + output) in a deterministic shape suitable for trace-based evaluators like `tool_trajectory`. +- Provide optional per-run streaming log artifacts on disk and publish log paths so the CLI can show them early (Codex/Pi pattern). + +Non-Goals: +- Full UI/interactive experiences (OpenCode TUI, rich streaming token output in AgentV terminal). +- Implementing every OpenCode event type as a first-class AgentV trace event. +- Distributed / remote OpenCode deployments that require auth beyond local process execution. + +## Key Decisions + +- **Use OpenCode’s first-party SDK v2** (`@opencode-ai/sdk/v2`) rather than implementing a custom HTTP + SSE client. + - Rationale: typed API surface, server lifecycle helper, fewer protocol footguns. + +- **Primary completion signal:** use `client.session.prompt(...)` to run the request and treat its response as authoritative for the final assistant message and parts. + - Streaming SSE is used for logs and (optionally) richer incremental trace capture. + +- **Working directory isolation:** execute each eval case attempt in its own filesystem directory (AgentV temp workspace). The OpenCode client MUST include the directory context so OpenCode operates within that directory. + - Rationale: reproducibility, parallelism, and preventing cross-contamination between eval cases. + +## Provider Lifecycle + +### Initialization +- Resolve target settings (binary/executable path, server config, model selection, permissions behavior, logging options). +- Start a local OpenCode server if no `baseUrl` is configured. + - Prefer a per-process server instance (shared by provider invocations) to reduce spawn overhead. + - The provider MUST avoid port collisions under parallel workers (either choose an ephemeral port, or allocate from a safe range). + +### Per-eval invocation +For each `ProviderRequest`: +1. Create/resolve the eval-case work directory (temp workspace). +2. Create or reuse an OpenCode `sessionID` scoped to that directory. +3. If streaming logs enabled, open the stream log file and subscribe to `client.event.subscribe({ directory })` and write JSONL. +4. Send the prompt using `client.session.prompt({ sessionID, directory, system, parts, model?, agent?, tools? })`. +5. Build `ProviderResponse` from the returned `parts` (and optionally from gathered SSE events). +6. Tear down the SSE subscription for this invocation; keep the server alive for other requests. + +### Shutdown +- Ensure spawned server processes are terminated on completion or abort. + +## Prompt & Message Mapping + +### Inputs +AgentV provides: +- `question` (formatted question string) +- optional `systemPrompt` +- optional `guidelines` (unwrapped content for non-agent providers) +- optional `guideline_files` / `input_files` (paths, often represented as `file://` links for agent providers) +- optional `chatPrompt` (multi-message) + +Mapping approach: +- Prefer using `chatPrompt` when present. + - Convert AgentV roles into OpenCode `system` + `parts`. + - Include the final user query as a `text` part. +- For filesystem-capable agent providers (including OpenCode), prefer referencing guideline and attachment files as file links rather than embedding large inline content. + +### Outputs +OpenCode returns an assistant message with `parts` including: +- `text` (assistant text) +- `reasoning` (optional) +- `tool` parts with `callID`, `tool`, and `state` (pending/running/completed/error) + +AgentV output mapping: +- Construct a single final `OutputMessage` with: + - `role: "assistant"` + - `content: ` + - `toolCalls: ToolCall[]` derived from `tool` parts: + - `id` = OpenCode `callID` + - `tool` = OpenCode `tool` + - `input` = `state.input` when present + - `output` = `state.output` when present (for completed) + +Optionally (future): emit separate `OutputMessage` entries for tool results, reasoning, or step boundaries. This is not required for initial tool-trajectory support. + +## Streaming Logs + +### Log content +- Default format: JSONL where each line is a single OpenCode SSE event object. +- MAY additionally include human-readable “summary” lines, but JSON objects MUST be preserved to keep tooling stable. + +### Log path publication +- When the provider selects a log file path, it publishes `{ filePath, targetName, evalCaseId?, attempt? }` to a process-local tracker. + +## Permissions + +OpenCode can emit `permission.asked` events (e.g., filesystem writes, command execution). + +Initial policy: +- Provide a target option to auto-approve permissions (`once` or `always`) or reject. +- Default SHOULD be conservative (reject) unless explicitly enabled. + +## Risks / Trade-offs +- **Port management / concurrency:** shared server improves performance but requires careful port selection and isolation. +- **Trace fidelity:** relying on final `parts` is deterministic but may omit some intermediate streaming deltas. +- **Permission behavior:** auto-approval increases convenience but raises safety risk; default should remain restrictive. + +## Migration Plan +- Non-breaking addition: new provider kind and target schema fields are additive. +- Existing targets remain valid. + +## Open Questions +- Should AgentV support connecting to an externally-running OpenCode server (`baseUrl`) in addition to spawning a local server? +- Should OpenCode be treated as an `AGENT_PROVIDER_KIND` (filesystem access) by default? +- Which OpenCode “tools” should be enabled/disabled by default when running evals? diff --git a/openspec/changes/add-opencode-log-streaming/proposal.md b/openspec/changes/add-opencode-log-streaming/proposal.md index 6bfb6d0..a2c9a4a 100644 --- a/openspec/changes/add-opencode-log-streaming/proposal.md +++ b/openspec/changes/add-opencode-log-streaming/proposal.md @@ -1,25 +1,38 @@ -# Change: Add OpenCode log streaming artifacts (Codex-style) +# Change: Add OpenCode provider support (with stream log artifacts) ## Why -When evaluating agentic providers, users need visibility into what the agent is doing while the run is in-progress. AgentV currently exposes this for some agent CLIs (e.g. Codex/Pi) by writing a per-run log file and printing its path, but OpenCode support has no equivalent yet. +AgentV currently supports agentic providers that operate over a filesystem and emit tool calls (e.g. Codex, Pi coding agent, VS Code subagent). OpenCode is another popular agent runtime with a well-defined event model (SSE) and structured tool lifecycle. + +To evaluate agentic behavior (especially with deterministic evaluators like `tool_trajectory`) AgentV needs: +- A first-class `opencode` provider kind that can run OpenCode in an isolated per-eval workspace. +- A stable mapping from OpenCode tool parts into AgentV `outputMessages/toolCalls`. +- Debug visibility during execution, ideally via per-run stream logs that the CLI can surface early (Codex/Pi pattern). ## What Changes +- Add a new provider kind: `opencode`. +- Define required/optional target configuration for OpenCode in `targets.yaml`. +- Define how the OpenCode provider constructs prompts (system + parts) and executes within a per-eval-case work directory. +- Define the mapping from OpenCode message parts (especially `tool` parts) into AgentV `ProviderResponse.outputMessages` and `ToolCall` fields. - Add a standard mechanism for an OpenCode provider to write per-run “stream logs” to disk (under `.agentv/logs/opencode/` by default). - Add a lightweight “log tracker” so the `agentv eval` CLI can surface OpenCode log file paths immediately (same pattern as Codex/Pi). -- Define the expected log content at a high level (raw event JSONL or summarized lines) so tooling remains stable even if OpenCode’s internal event structure evolves. +- Define the expected log content at a high level (raw event JSONL is the default) so tooling remains stable even if OpenCode’s internal event structure evolves. ## Non-Goals -- Implement full OpenCode provider execution in this change (this proposal only establishes logging + CLI surfacing conventions). -- Add new CLI flags or UI features beyond listing log file paths. +- Rich streaming UX in the AgentV terminal (token-by-token output). +- OpenCode TUI integration. +- Advanced OpenCode orchestration features beyond single-request evaluation (e.g., long-lived interactive sessions shared across evalcases). +- New CLI flags or UI features beyond listing log file paths. ## Impact - Affected specs: - - `evaluation` (provider integration expectations for OpenCode logging) + - `evaluation` (OpenCode provider behavior, output mapping, and logging expectations) + - `validation` (targets schema updates for `provider: opencode`) - `eval-cli` (CLI surfacing of provider log file paths) - Affected code (planned follow-up implementation): - - Core: provider log tracker + exports + - Core: OpenCode provider implementation, target schema updates, log tracker + exports - CLI: subscribe and display OpenCode log paths ## Compatibility - Non-breaking. Existing targets and providers are unaffected. - Logging remains optional (providers may omit log streaming when disabled or when directories cannot be created). + diff --git a/openspec/changes/add-opencode-log-streaming/specs/evaluation/spec.md b/openspec/changes/add-opencode-log-streaming/specs/evaluation/spec.md index 7f60b07..88e13a7 100644 --- a/openspec/changes/add-opencode-log-streaming/specs/evaluation/spec.md +++ b/openspec/changes/add-opencode-log-streaming/specs/evaluation/spec.md @@ -1,5 +1,22 @@ ## ADDED Requirements +### Requirement: OpenCode provider execution + +The system SHALL support an OpenCode-backed provider when a target is configured with `provider: opencode`. + +#### Scenario: Execute an eval case with OpenCode +- **WHEN** a target is configured with `provider: opencode` +- **AND** an eval case is executed for that target +- **THEN** the system invokes OpenCode to generate an assistant response +- **AND** runs OpenCode within an isolated per-eval-case working directory +- **AND** returns a `ProviderResponse` with `outputMessages` populated + +#### Scenario: Provider fails cleanly when OpenCode is unavailable +- **WHEN** an OpenCode target is selected +- **AND** the OpenCode runtime cannot be started or reached (missing executable, failed server startup, unreachable base URL) +- **THEN** the eval case attempt fails with an actionable error message +- **AND** other eval cases continue when running in parallel + ### Requirement: OpenCode provider log streaming artifacts When an OpenCode-based provider run is executed, the system SHALL support writing a per-run stream log file and surfacing its path for debugging. @@ -27,3 +44,35 @@ The system SHALL provide a mechanism to publish OpenCode log file paths so the C - **WHEN** the provider decides on a log file path for an OpenCode run - **THEN** it publishes `{ filePath, targetName, evalCaseId?, attempt? }` to a process-local log tracker - **AND** downstream consumers MAY subscribe to this tracker to display the log path + +### Requirement: OpenCode tool-call trace mapping + +The OpenCode provider SHALL map OpenCode tool lifecycle parts into AgentV tool calls so deterministic evaluators can operate on the trace. + +#### Scenario: Tool parts become toolCalls +- **WHEN** OpenCode returns a response containing one or more `tool` parts +- **THEN** the provider emits `ProviderResponse.outputMessages` containing `toolCalls` +- **AND** each tool call includes `tool` name and `input` arguments when available +- **AND** completed tool calls include `output` when available +- **AND** tool call identifiers are stable across retries within an attempt when OpenCode provides them + +#### Scenario: Tool error parts are surfaced +- **WHEN** OpenCode returns a `tool` part with error state +- **THEN** the provider includes the tool call in `toolCalls` +- **AND** includes the error information in a provider-specific metadata field or in `output` with a structured error payload + +### Requirement: OpenCode permission handling + +The OpenCode provider SHALL handle OpenCode permission requests deterministically based on target configuration. + +#### Scenario: Default permission policy is conservative +- **WHEN** OpenCode emits a permission request during an eval case +- **AND** the target does not explicitly enable auto-approval +- **THEN** the provider rejects the request +- **AND** the eval attempt fails with a clear error describing the blocked permission + +#### Scenario: Auto-approve permissions when configured +- **WHEN** OpenCode emits a permission request during an eval case +- **AND** the target is configured to auto-approve permissions +- **THEN** the provider approves the request according to the configured policy (e.g., once or always) +- **AND** execution continues normally diff --git a/openspec/changes/add-opencode-log-streaming/specs/validation/spec.md b/openspec/changes/add-opencode-log-streaming/specs/validation/spec.md new file mode 100644 index 0000000..b70597e --- /dev/null +++ b/openspec/changes/add-opencode-log-streaming/specs/validation/spec.md @@ -0,0 +1,20 @@ +## ADDED Requirements + +### Requirement: Validate OpenCode targets + +The system SHALL validate OpenCode provider targets in `targets.yaml` using Zod schemas, rejecting unknown properties and accepting both snake_case and camelCase forms. + +#### Scenario: Accept a valid OpenCode target +- **WHEN** a targets file contains a target with `provider: opencode` +- **THEN** the configuration is accepted +- **AND** the resolved config normalizes to camelCase + +#### Scenario: Reject unknown OpenCode target properties +- **WHEN** an OpenCode target contains an unrecognized property (e.g., `streamlog_dir` instead of `stream_log_dir`) +- **THEN** validation fails with an error identifying the unknown property path + +#### Scenario: Accept snake_case and camelCase equivalence for OpenCode settings +- **WHEN** an OpenCode target uses `stream_logs` (snake_case) +- **OR** uses `streamLogs` (camelCase) +- **THEN** both are accepted as equivalent +- **AND** the resolved config normalizes to `streamLogs` diff --git a/openspec/changes/add-opencode-log-streaming/tasks.md b/openspec/changes/add-opencode-log-streaming/tasks.md index 8b8e588..37356ea 100644 --- a/openspec/changes/add-opencode-log-streaming/tasks.md +++ b/openspec/changes/add-opencode-log-streaming/tasks.md @@ -1,13 +1,18 @@ ## 1. Implementation -- [ ] 1.1 Add `opencode-log-tracker` module (record/consume/subscribe) -- [ ] 1.2 Export OpenCode log tracker functions from provider index -- [ ] 1.3 Update `agentv eval` CLI to subscribe and print OpenCode log paths -- [ ] 1.4 Update progress display labels to include OpenCode -- [ ] 1.5 Wire OpenCode provider (future PR) to call `recordOpencodeLogEntry()` and write log lines +- [ ] 1.1 Add new provider kind `opencode` (core provider registry + aliases) +- [ ] 1.2 Extend targets schema to support `provider: opencode` and validate settings +- [ ] 1.3 Implement OpenCode provider invocation (server lifecycle, per-eval-case directory, prompt execution) +- [ ] 1.4 Map OpenCode `tool` parts into AgentV `outputMessages/toolCalls` for trace-based evaluators +- [ ] 1.5 Add OpenCode stream log writer (JSONL) and log path tracker (record/consume/subscribe) +- [ ] 1.6 Export OpenCode log tracker functions from provider index +- [ ] 1.7 Update `agentv eval` CLI to subscribe and print OpenCode log paths (no duplicates) ## 2. Validation - [ ] 2.1 Run `openspec validate add-opencode-log-streaming --strict` -- [ ] 2.2 Add/update unit tests if new runtime logic is introduced +- [ ] 2.2 Add/update unit tests for: + - [ ] targets schema parsing for `opencode` targets + - [ ] tool-call mapping from OpenCode parts → AgentV `ToolCall` + - [ ] log tracker dedupe behavior (CLI subscriber) ## 3. Documentation - [ ] 3.1 Update any relevant skill/docs (if the project uses them for provider setup)