Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions openspec/changes/add-copilot-cli-provider/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
## Context

AgentV supports multiple provider kinds:
- Cloud LLM providers (Azure OpenAI, Anthropic, Gemini)
- Agent-style providers that operate on a workspace (Codex CLI, VS Code Copilot, Claude Code, Pi)

GitHub Copilot provides a CLI package (`@github/copilot`) that can be invoked via `npx` and interacted with through stdin/stdout. AgentV can adopt this pattern for evaluation runs.

## Goals
- Add a built-in provider kind that runs GitHub Copilot CLI (`@github/copilot`) as an external process.
- Keep configuration minimal and consistent with existing CLI-style providers (especially `codex`).
- Ensure deterministic capture of the “candidate answer” with good error messages and artifacts.

## Proposed Provider Identity
- Canonical kind: `copilot-cli`
- Accepted aliases (to reduce user friction): `copilot`, `github-copilot`

Rationale: `copilot` alone is ambiguous with the VS Code Copilot provider; `copilot-cli` makes intent explicit.

## Invocation Strategy

### Base command
Default to invoking Copilot via npm:
- `npx -y @github/copilot@<pinnedVersion>`

Rationale:
- Avoid requiring a global install.
- Match vibe-kanban’s approach.
- Pin a version to reduce behavior drift across runs.

### Process I/O
- Write the rendered prompt to stdin, then close stdin.
- Capture stdout/stderr.

### Log directory
- Pass `--log-dir <path>` and `--log-level debug` when supported.
- Record the log directory path in the ProviderResponse metadata for debugging.

### Timeout & cancellation
- Support a target-configured timeout (seconds or ms consistent with AgentV conventions).
- Abort via `AbortSignal` if provided by the orchestrator.

## Prompt Construction
Copilot CLI runs in a workspace directory, so the provider should follow the same “agent provider preread” pattern used by `vscode` and `codex`:
- Include a preread section that links guideline and attachment files via `file://` URLs.
- Include the user query.

## Response Extraction
Copilot CLI’s stdout is expected to contain a mixture of progress text and the final assistant message.

Proposed minimal extraction algorithm:
- Strip ANSI escape sequences.
- Trim surrounding whitespace.
- Treat the remaining stdout content as the candidate answer.
- Preserve full stdout/stderr as artifacts on failures.

If Copilot CLI later provides a stable, documented structured output mode, AgentV MAY add opt-in support in a future change.

## Target Configuration Surface
Keep this comparable to `codex`:
- `provider: copilot-cli`
- `settings.executable` (optional): defaults to `npx`
- `settings.args` (optional): appended args; default includes `-y @github/copilot@<version>` and flags
- `settings.cwd` (optional)
- `settings.timeoutSeconds` (optional)
- `settings.env` (optional)
- `settings.model` (optional)

Avoid overfitting to every Copilot CLI flag initially; allow passthrough args for advanced use.

## Security & Safety Notes
- Like other agent providers, Copilot CLI can read local files from the workspace.
- Any “allow all tools” behavior (if exposed) should be opt-in and clearly documented.
- Prefer defaulting to safer settings, consistent with existing providers.

## Testing Strategy (implementation stage)
- Unit tests for:
- command argument construction
- stdout parsing/extraction
- timeout handling
- Integration-style tests (mock runner) that simulate Copilot CLI stdout/stderr.

## Alternatives Considered
- Use `gh copilot`:
- Rejected: requested explicitly to use `@github/copilot` like vibe-kanban.
- Implement as `cli` provider template:
- Rejected: would push complexity to users and lose built-in prompt construction and artifacts.
36 changes: 36 additions & 0 deletions openspec/changes/add-copilot-cli-provider/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Change: Add GitHub Copilot CLI provider

## Why
AgentV currently supports running agent-style evaluations via `provider: vscode` (VS Code) and `provider: codex` (Codex CLI), but it does not support the GitHub Copilot CLI package (`@github/copilot`). Teams that standardize on Copilot CLI (often via `npx -y @github/copilot`) cannot evaluate the same prompts/tasks in AgentV without custom wrappers.

This change adds a first-class `copilot-cli` provider so AgentV can invoke Copilot CLI directly and capture responses for evaluation.

## What Changes
- Add a new target provider kind: `copilot-cli` (GitHub Copilot CLI via `@github/copilot`).
- Add target configuration fields for Copilot CLI execution (command/executable, args, model, timeout, cwd, env).
- Implement provider execution by spawning the Copilot CLI process, piping a constructed prompt to stdin, and capturing the final assistant response from stdout.
- Persist provider artifacts (stdout/stderr and optional log-dir files) for debugging on failures.
- Update documentation/templates so `agentv init` guidance includes Copilot CLI targets.

## Non-Goals
- Do not add `gh copilot` (GitHub CLI subcommand) support in this change.
- Do not add interactive “resume session” UX; evaluations run as independent invocations.
- Do not introduce a new plugin system; this remains a built-in provider like `codex`/`vscode`.

## Impact
- Affected specs:
- `evaluation` (new provider invocation behavior)
- `validation` (targets schema/validation for the new provider)
- Affected code (implementation stage):
- `packages/core/src/evaluation/providers/*` (new provider + provider registry)
- `packages/core/src/evaluation/validation/targets-validator.ts`
- `apps/cli` docs/templates (provider list + examples)

## Compatibility
- Backwards compatible: existing targets continue to work unchanged.
- `provider: copilot-cli` is additive.

## Decisions
- Canonical provider kind: `copilot-cli`.
- Accepted provider aliases: `copilot` and `github-copilot`.
- Output contract: unless/ until Copilot CLI exposes a stable machine-readable mode that AgentV supports, the provider treats Copilot CLI stdout as the candidate answer after stripping ANSI escapes and trimming surrounding whitespace.
12 changes: 12 additions & 0 deletions openspec/changes/add-copilot-cli-provider/specs/evaluation/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
## MODIFIED Requirements

### Requirement: Provider Integration
The system SHALL integrate with supported providers using target configuration and optional retry settings.

#### Scenario: GitHub Copilot CLI provider
- **WHEN** a target uses `provider: copilot-cli` (or an accepted alias)
- **THEN** the system ensures the Copilot CLI launcher is available (defaulting to `npx` when not explicitly configured)
- **AND** builds a preread prompt document that links guideline and attachment files via `file://` URLs and includes the user query
- **AND** runs GitHub Copilot CLI via `@github/copilot` with a pinned version by default (configurable), piping the prompt via stdin
- **AND** captures stdout/stderr and extracts a single candidate answer text from the final assistant output
- **AND** on failure, the error includes exit code/timeout context and preserves stdout/stderr and any log artifacts for debugging
14 changes: 14 additions & 0 deletions openspec/changes/add-copilot-cli-provider/specs/validation/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## MODIFIED Requirements

### Requirement: Targets File Schema Validation
The system SHALL validate target configuration using Zod schemas that serve as both runtime validators and TypeScript type sources.

#### Scenario: Unknown Copilot CLI provider property rejected
- **WHEN** a targets file contains a Copilot CLI target with an unrecognized property
- **THEN** the system SHALL reject the configuration with an error identifying the unknown property

#### Scenario: Copilot CLI provider accepts snake_case and camelCase settings
- **WHEN** a targets file uses `provider: copilot-cli` (or an accepted alias)
- **AND** configures supported settings using either snake_case or camelCase
- **THEN** validation succeeds
- **AND** the resolved config normalizes to camelCase
24 changes: 24 additions & 0 deletions openspec/changes/add-copilot-cli-provider/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## 1. Provider + Targets
- [ ] 1.1 Add `copilot-cli` to `ProviderKind`, `KNOWN_PROVIDERS`, and `AGENT_PROVIDER_KINDS` (and decide aliases).
- [ ] 1.2 Extend target parsing to recognize `provider: copilot-cli` (and chosen aliases) and resolve a typed Copilot config.
- [ ] 1.3 Extend `targets-validator` to accept the Copilot settings keys and reject unknown properties with actionable errors.

## 2. Execution
- [ ] 2.1 Implement `CopilotCliProvider` (mirroring patterns from `CodexProvider`): spawn process, write prompt to stdin, capture stdout/stderr, enforce timeout.
- [ ] 2.2 Implement prompt preread rendering consistent with other agent providers (file:// links for guidelines and attachments).
- [ ] 2.3 Implement robust stdout parsing to extract a single candidate answer; preserve raw artifacts on errors.
- [ ] 2.4 Register provider in provider factory/registry.

## 3. Docs + Templates
- [ ] 3.1 Update CLI docs to list `copilot-cli` as a supported provider and add a minimal `targets.yaml` example.
- [ ] 3.2 Update `apps/cli/src/templates/.claude/skills/agentv-eval-builder/` references so `agentv init` users get Copilot CLI guidance.

## 4. Tests
- [ ] 4.1 Add unit tests for config resolution and argument rendering.
- [ ] 4.2 Add provider tests using a mocked runner (no real Copilot CLI dependency) for success, invalid output, and timeout.

## 5. Validation
- [ ] 5.1 Run `bun run build`, `bun run typecheck`, `bun run lint`, `bun test`.

## 6. Release hygiene
- [ ] 6.1 Add a changeset if user-visible behavior changes should ship in the next release.