Add multi-round multi-LLM debate PR review workflow#52
Add multi-round multi-LLM debate PR review workflow#52
Conversation
|
<!-- claude-code-review --> Multi-LLM Debate Review Workflow - Code ReviewSummaryThis PR adds a sophisticated multi-round debate system using multiple LLM providers (Claude, OpenAI, Gemini) to collaboratively review pull requests. The workflow orchestrates parallel AI reviews, aggregates feedback through debate rounds, and enforces a convergence gate before allowing merges. 🟢 Strengths
🔴 Critical Issues1. Blocking Merge Gate is Too Strict (High Severity)Location: The final enforcement step blocks ALL PRs where debate does not converge or any API call fails. This will halt merges for network issues, API downtime, rate limits, and legitimate complex changes where LLMs disagree. Problem: Blocks every PR where:
Impact: Significant workflow friction. Manual intervention required even for low-risk changes. Recommendation: Only block when there is actual high risk: if: |
steps.diff.outputs.skip == 'false' &&
steps.debate.outcome == 'success' &&
steps.debate.outputs.final_risk == 'high' &&
steps.debate.outputs.converged \!= 'true'2. Sensitive Data Exposure Risk (High Severity)Location: Sanitization patterns are incomplete. Missing:
Impact: Contract changes with test keys or sentinel addresses (like Recommendation: Add patterns for private keys, AWS keys, Flow addresses, and connection strings. 3. API Key Validation Missing (Medium Severity)Location: Only checks if keys exist, not if valid or have quota. First failure happens mid-debate after timeout. Recommendation: Add pre-flight validation with minimal test prompt to each API endpoint. 🟡 Medium Priority Issues4. No Retry Logic for API FailuresAPI calls have 90s timeout but no retry on transient failures (429 rate limits, 503 unavailable). Recommendation: Implement exponential backoff with 3 retry attempts. 5. Hardcoded Model Versions
Recommendation: Use stable aliases or add graceful error handling for model_not_found errors. 6. Race Condition in Transcript UpdatesParallel provider calls could race when appending to transcript if Recommendation: Use file locking or serialize appending after parallel calls complete. 🟢 Minor Improvements
🔵 Questions
📊 Security Assessment
🎯 Overall AssessmentCode Quality: 8/10 Recommendation: Issue #1 (blocking gate) and #2 (sanitization) must be addressed before merge. Others can be follow-ups. 🛠️ Suggested Follow-ups
|
Summary
This PR adds a new multi-round, multi-LLM debate review workflow for PRs in
FlowYieldVaultsEVM.The workflow runs multiple models on the same PR diff, lets them cross-review each other across rounds, and stops only when they converge (or fails safe to manual review if they do not).
What’s Added
.github/workflows/multi-llm-debate-review.ymlscripts/multi-llm-debate-review.shBehavior
pull_requestevents (opened,synchronize,ready_for_review).claude,openai, optionalgemini).Configuration
Repository Secrets:
CLAUDE_API_KEY(orANTHROPIC_API_KEY)OPENAI_API_KEYGEMINI_API_KEY(optional)Repository Variables:
AI_DEBATE_PROVIDERS(default:claude,openai)AI_DEBATE_MAX_DIFF_SIZE(default:150000)AI_DEBATE_MAX_ROUNDS(default:4)AI_DEBATE_MIN_ROUNDS(default:2)AI_REVIEW_RISK_TOLERANCE(default:moderate)Reference Implementation Sources
This PR was informed by patterns in the Dapper
actionsrepository:Notes