Skip to content

Conversation

@ThomasK33
Copy link
Member

Adds an interactive Harness-from-Plan workflow so Ralph loop runs with an explicit, user-reviewed harness.

What changed

  • Added a hidden harness-init agent (repo-aware, interactive harness authoring) with UI styling.
  • Restricted harness-init edits to .mux/harness/*.jsonc via tool-layer allowedEditPaths enforcement.
  • Restricted harness-init sub-agent spawning to read-only explore tasks.
  • Updated the Plan tool card’s Start Ralph loop button to switch to harness-init and request a harness proposal.
  • Added a propose_harness tool + UI card with Approve & Start to start the loop in Exec mode.

Validation

  • make static-check

📋 Implementation Plan

Interactive Harness-from-Plan (repo-aware + chat-first approval)

Goals

  • Let the Harness-from-Plan flow do repo-aware, read-only investigation (read files, rg, inspect CI config) before proposing checklist + gates.
  • Allow the agent to spawn read-only explore sub-agents to answer: “what parts of the repo are affected?” and “what gates/commands exist here?”
  • Make harness generation interactive (Plan-Mode-like):
    • you converse with the agent, it updates the harness, and proposes again
    • you manually approve the harness
    • approval automatically starts the Ralph loop in the parent workspace

Recommended approach — True inline “Harness Mode” (hidden harness-init agent)

Net LoC (product code): ~400–650

1) Add a hidden harness-init agent (interactive, repo-aware)

  • Add a new builtin agent ID (e.g. harness-init) that is:
    • ui.hidden: true (not selectable in the agent picker / command palette)
    • not runnable as a subagent (do not set subagent.runnable: true; add a denylist check in the task tool for defense-in-depth)
  • UI indicator:
    • Add --color-harness-init-mode (and optional hover/alpha variants) in src/browser/styles/globals.css.
    • Update src/browser/components/AgentModePicker.tsx to show Harness Init in that color.
    • Update src/browser/components/ChatInput/index.tsx so the Send button uses bg-harness-init-mode when the active agent is harness-init.
  • Update src/node/builtinAgents/harness-from-plan.md and the new harness-init prompt to share the same guidance:
    • Repo investigation (read-only):
      • detect task runners + CI entrypoints (e.g., Makefile, justfile, package.json scripts, .github/workflows/*)
      • map plan/change-scope to impacted subsystems by tracing imports/callsites (avoid overfitting to a single file)
    • Gate selection philosophy:
      • propose gates that balance coverage vs cost (correctness, robustness, time constraints)
      • include at least one broad-but-cheap gate that matches the repo’s tooling (typecheck/lint/build)
      • add targeted tests/gates for the impacted subsystems
      • if an expensive gate seems warranted, ask before choosing it and suggest cheaper alternatives

2) Constrain edits to harness files (but allow in-place diffs)

  • Allow file_edit_* tools for harness-init, but enforce a tool-layer allowlist so it can only edit .mux/harness/*.jsonc.
    • Implementation: extend ToolConfiguration with allowedEditPaths and enforce it in src/node/services/tools/fileCommon.ts.
    • Populate allowedEditPaths in src/node/services/aiService.ts when the active agent is harness-init.
  • Seed .mux/harness/<workspace>.jsonc when harness-init starts so the agent can make small diffs without re-outputting the whole file.
  • (Optional but recommended) In propose_harness, assert the working tree has no changes outside .mux/harness/*.jsonc to catch accidental edits (including via bash).

3) Enable explore subagents safely

  • Allow task + task_await for harness-init.
  • Tighten src/node/services/tools/task.ts so harness-init can only spawn agentId: "explore".

4) Wire “Start Ralph Loop” to switch agent + send an initiating message

  • In src/browser/components/tools/ProposePlanToolCall.tsx:
    • Replace the direct api.workspace.loop.startFromPlan() call with a Plan-Mode-like transition:
      1. updatePersistedState(getAgentIdKey(workspaceId), "harness-init")
      2. api.workspace.sendMessage({ message: "Generate a Ralph harness from the current plan and propose it" })

5) Add propose_harness tool + approval UI (mirrors propose_plan)

  • Backend tool propose_harness:
    • validate harness exists + is parseable
    • call recordFileState so the UI can detect out-of-band edits
  • Frontend tool card (e.g. ProposeHarnessToolCall.tsx):
    • fetch harness config (prefer api.workspace.harness.get if it already returns enough; otherwise add a dedicated getHarnessContent endpoint)
    • show checklist + gates
    • provide “Approve & Start Ralph Loop”:
      1. switch agent back to exec
      2. start loop using the existing harness (add a workspace.loop.start endpoint if one doesn’t already exist)

6) Tests

  • Unit tests:
    • allowedEditPaths enforcement (can edit harness; cannot edit other files)
    • harness gate allowlist + normalization stays stable
  • Minimal API test for propose_harness validation.

Alternative (less refactor)

Option — Dedicated “Harness Review” child workspace

Net LoC: ~300–500

  • Spawn a child harness-review workspace and make the user switch into it to answer questions.
  • Lower change surface, but worse UX (workspace switching) and harder to make it feel Plan-Mode-like.

Generated with mux • Model: openai:gpt-5.2 • Thinking: high • Cost: $60.54

Adds workspace-local harness config (checklist + gates) and an opt-in Ralph loop runner.

- Backend services: WorkspaceHarnessService, GateRunnerService, GitCheckpointService, LoopRunnerService
- ORPC: workspace.harness + workspace.loop endpoints
- UI: RightSidebar Harness tab + command palette actions for gates/checkpoint/loop

Signed-off-by: Thomas Kosiewski <tk@coder.com>

---
_Generated with  • Model: openai:gpt-5.2 • Thinking: high • Cost: 0.17_

Change-Id: I99428a620b0bd65e9b9a2bb9023b9dd9e0843bc1
Change-Id: I15d81ab1136b5437df531ba6cb3e23cf84c321a0
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Ide9e2ac1fa93252310350441843ae4d7eaa0ad25
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I0f684cca69decbe2756577ec54c321ea0e13b182
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Iebbcc21aaa8a919be5e1217c0d44b6cee070d782
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Include the workspace plan file path in harness reset/loop bearings summaries.

Signed-off-by: Thomas Kosiewski <tk@coder.com>

---
_Generated with `mux` • Model: `openai:gpt-5.2` • Thinking: `xhigh` • Cost: $47.81_

Change-Id: I89cf61ac2e147042882b58297d0bf9dde49835fd
Change-Id: Icf5963d92a65300117de0c264272f8ca3952c4e0
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 446b377437

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Change-Id: Ie569d9a08cf122c8d7dce626003d1620a6e37bf9
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I88bf5879b908141790c6119d99f93983071a6b5e
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

Follow-ups pushed:

  • Loop runner now re-reads the latest harness config when marking checklist items doing/done (avoids clobbering concurrent edits).
  • Updated sidebar layout + IPC tests to account for the Harness tab and to reduce Windows timing flakiness.

Note: Chromatic "UI Review" / "UI Tests" are still pending (require baseline acceptance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant