You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import EvidenceBasedDebug from '@site/shared-prompts/\_evidence-based-debug.mdx';
8
+
7
9
Debugging with AI agents isn't about describing symptoms and hoping for solutions. It's about requiring evidence at every step. The core principle: **never accept a fix without reproducible proof it works**.
8
10
9
11
## Always Require Evidence
@@ -14,6 +16,8 @@ The fundamental shift in debugging with AI is moving from "what do you think is
14
16
15
17
**Production pattern:** Provide reproduction steps, give the agent access to diagnostic tools, and require before/after evidence.
16
18
19
+
<EvidenceBasedDebug />
20
+
17
21
## Code Inspection: Understanding Before Fixing
18
22
19
23
Before diving into logs or reproduction, have the agent explain the architecture and execution flow. Use conversational analysis to identify mismatches between your mental model and actual system behavior. Ask the agent to trace request paths, explain data flow, and identify potential failure points based on code structure.
import GenerateAgentsMD from '@site/shared-prompts/\_generate-agents-md.mdx';
9
10
10
11
When you join a new project, the first week is brutal. You're swimming in unfamiliar architecture, tech stack decisions, tribal knowledge buried in Slack threads, and that one critical bash script everyone runs but nobody documented.
11
12
@@ -221,30 +222,14 @@ npm run deploy # Deploy to GitHub Pages
221
222
222
223
**The meta-move: Apply lessons 3-5 to generate context files automatically.** Instead of manually drafting `AGENTS.md` or `CLAUDE.md`, use the four-phase workflow ([Lesson 3](/docs/methodology/lesson-3-high-level-methodology)) to let agents bootstrap their own context. **Research phase:** Use ChunkHound's `code_research()` tool to understand your project's architecture, patterns, and conventions—query for architecture, coding styles, module responsibilities, and testing conventions, etc to build a comprehensive architectural understanding. Use ArguSeek's `research_iteratively()` and `fetch_url()` to retrieve framework documentation, best practices, and security guidelines relevant to your tech stack. **Plan phase:** The agent synthesizes codebase insights (from ChunkHound) and domain knowledge (from ArguSeek) into a structured context file plan. **Execute phase:** Generate the context file using prompt optimization techniques specific to your model. **Validate phase:** Test the generated context with a typical task, iterate based on gaps.
223
224
224
-
**Concrete example prompt:**
225
-
226
-
```
227
-
Generate AGENTS.md for this project.
228
-
Use the code research tool to to learn the project architecture, tech stack,
229
-
how auth works, testing conventions, coding style, and deployment process.
230
-
Use ArguSeek to fetch current best practices for the tech stack used and the
231
-
latest security guidelines.
232
-
233
-
Create a concise file (≤500 lines) with sections:
234
-
- Tech Stack
235
-
- Development Commands (modified for non-interactive execution)
236
-
- Architecture (high-level structure)
237
-
- Coding Conventions and Style
238
-
- Critical Constraints
239
-
- Common Pitfalls (if found).
240
-
241
-
Do NOT duplicate information already in README or code comments—instead, focus
242
-
exclusively on AI-specific operations: environment variables, non-obvious
243
-
dependencies, and commands requiring modification for agents.
244
-
```
225
+
<GenerateAgentsMD />
245
226
246
227
This prompt demonstrates grounding ([Lesson 5](/docs/methodology/lesson-5-grounding)): ChunkHound provides codebase-specific context, ArguSeek provides current ecosystem knowledge, and structured Chain-of-Thought ensures the agent follows a methodical path. The result: production-ready context files generated in one iteration, not manually curated over weeks. Add tribal knowledge manually afterward—production incidents, team conventions, non-obvious gotchas that only humans know.
247
228
229
+
:::tip Reference
230
+
See the complete prompt template with validation guidance and adaptations: [Generate AGENTS.md](/prompts/onboarding/generate-agents-md)
Copy file name to clipboardExpand all lines: website/docs/practical-techniques/lesson-8-tests-as-guardrails.md
+12-36Lines changed: 12 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,8 @@ title: 'Lesson 8: Tests as Guardrails'
5
5
---
6
6
7
7
import ThreeContextWorkflow from '@site/src/components/VisualElements/ThreeContextWorkflow';
8
+
import EdgeCaseDiscovery from '@site/shared-prompts/\_edge-case-discovery.mdx';
9
+
import TestFailureDiagnosis from '@site/shared-prompts/\_test-failure-diagnosis.mdx';
8
10
9
11
AI agents can refactor half your codebase in minutes. They'll rename functions, restructure modules, and update dozens of files—all while you grab coffee. This velocity is powerful, but dangerous. Small logic errors compound fast when changes happen at scale.
10
12
@@ -22,25 +24,13 @@ Before writing tests, use the planning techniques from [Lesson 7](./lesson-7-pla
22
24
23
25
**Prompt pattern for edge case discovery:**
24
26
25
-
```
26
-
How does validateUser() work? What edge cases exist in the current implementation?
27
-
What special handling exists for different auth providers?
28
-
Search for related tests and analyze what they cover.
29
-
```
27
+
<EdgeCaseDiscovery />
30
28
31
-
The agent searches for the function, reads implementation, finds existing tests, and synthesizes findings. This loads concrete constraints into context: OAuth users skip email verification, admin users bypass rate limits, deleted users are rejected.
29
+
The agent searches for the function, reads implementation, finds existing tests, and synthesizes findings from Step 1. This loads concrete constraints into context: OAuth users skip email verification, admin users bypass rate limits, deleted users are rejected. Step 2 analyzes the implementation against your questions and identifies untested paths. You now have a grounded list of edge cases derived from actual code, not generic testing advice.
32
30
33
-
**Follow up to identify gaps:**
34
-
35
-
```
36
-
Based on the implementation you found, what edge cases are NOT covered by tests?
37
-
What happens with:
38
-
- Null or undefined inputs
39
-
- Users mid-registration (incomplete profile)
40
-
- Concurrent validation requests
41
-
```
42
-
43
-
The agent analyzes the implementation against your questions and identifies untested paths. You now have a grounded list of edge cases derived from actual code, not generic testing advice.
31
+
:::tip Reference
32
+
See the complete prompt template with additional examples and adaptations: [Edge Case Discovery](/prompts/testing/edge-case-discovery)
33
+
:::
44
34
45
35
### Closed Loop: Evolve Code Alongside Tests
46
36
@@ -181,25 +171,7 @@ When tests fail, apply the same four-phase workflow from [Lesson 3](../methodolo
181
171
182
172
This diagnostic prompt applies techniques from [Lesson 4](../methodology/lesson-4-prompting-101.md): [Chain-of-Thought](../methodology/lesson-4-prompting-101.md#chain-of-thought-paving-a-clear-path) sequential steps, [constraints](../methodology/lesson-4-prompting-101.md#constraints-as-guardrails) requiring evidence, and [structured format](../methodology/lesson-4-prompting-101.md#applying-structure-to-prompts). Understanding why each element exists lets you adapt this pattern for other diagnostic tasks.
183
173
184
-
````markdown title="Diagnostic Prompt for Test Failures"
185
-
```
186
-
$FAILURE_DESCRIPTION
187
-
```
188
-
189
-
Use the code research to analyze the test failure above.
190
-
191
-
DIAGNOSE:
192
-
193
-
1. Examine the test code and its assertions.
194
-
2. Understand and clearly explain the intention and reasoning of the test - what is it testing?
195
-
3. Compare against the implementation code being tested
196
-
4. Identify the root cause of failure
197
-
198
-
DETERMINE:
199
-
Is this a test that needs updating or a real bug in the implementation?
200
-
201
-
Provide your conclusion with evidence.
202
-
````
174
+
<TestFailureDiagnosis />
203
175
204
176
**Why this works:**
205
177
@@ -212,6 +184,10 @@ Provide your conclusion with evidence.
212
184
213
185
You can adapt this for performance issues, security vulnerabilities, or deployment failures by changing the diagnostic steps while preserving the structure: sequential CoT → constrained decision → evidence requirement.
214
186
187
+
:::tip Reference
188
+
See the complete prompt template with detailed usage examples and adaptations: [Test Failure Diagnosis](/prompts/testing/test-failure-diagnosis)
189
+
:::
190
+
215
191
## Key Takeaways
216
192
217
193
-**Tests are documentation agents actually read** - They learn intent, edge cases, and constraints from test names, assertions, and comments. Write tests that explain the "why," not just verify the "what."
import DualOptimizedPR from '@site/shared-prompts/\_dual-optimized-pr.mdx';
8
+
import AIAssistedReview from '@site/shared-prompts/\_ai-assisted-review.mdx';
9
+
7
10
You've completed the implementation. Tests pass. The agent executed your plan successfully. Now comes the critical question: is it actually correct?
8
11
9
12
This is the **Validate** phase from [Lesson 3's four-phase workflow](../methodology/lesson-3-high-level-methodology.md)—the systematic quality gate before shipping. Code review catches the probabilistic errors that agents inevitably introduce: subtle logic bugs, architectural mismatches, edge cases handled incorrectly, patterns that don't quite fit your codebase.
@@ -60,6 +63,10 @@ DO NOT EDIT ANYTHING - only review.
60
63
61
64
After implementing code ([Lesson 7](./lesson-7-planning-execution.md)), writing tests ([Lesson 8](./lesson-8-tests-as-guardrails.md)), and making everything pass, this review step catches what the iterative development process left behind—the final quality gate before committing.
62
65
66
+
:::tip Reference
67
+
See the complete prompt template with iterative review guidance: [Comprehensive Code Review](/prompts/code-review/comprehensive-review)
68
+
:::
69
+
63
70
### Iterative Review: Repeat Until Green or Diminishing Returns
64
71
65
72
Code review is rarely one-pass—first review finds issues, you fix them, re-run tests ([Lesson 8](./lesson-8-tests-as-guardrails.md)) to catch regressions, then review again in a fresh context (not the same conversation where the agent will defend its prior decisions). Continue this cycle: review in fresh context, fix issues, validate with tests, repeat.
@@ -95,34 +102,7 @@ Traditional PR descriptions optimize for one audience or the other—too verbose
95
102
96
103
This prompt demonstrates multiple techniques from [Lesson 4 (Prompting 101)](../methodology/lesson-4-prompting-101.md), [Lesson 5 (Grounding)](../methodology/lesson-5-grounding.md), and [Lesson 7 (Planning & Execution)](./lesson-7-planning-execution.md):
97
104
98
-
```markdown
99
-
You are a contributor to {PROJECT_NAME} creating a GitHub pull request for the current branch.
100
-
Using the sub task tool to conserve context, explore the changes in the git history relative to main.
101
-
Summarize and explain them like you would to a fellow co-worker:
102
-
103
-
- Direct and concise
104
-
- Professional but conversational
105
-
- Assume competence and intelligence
106
-
- Skip obvious explanations
107
-
108
-
The intent of the changes are:
109
-
{CHANGES_DESC}
110
-
111
-
Building upon this, draft two markdown files: one for a human reviewer/maintainer of the project
112
-
and another complementary that's optimized for the reviewer's agent. Explain:
113
-
114
-
- What was done and the reasoning behind it
115
-
- Breaking changes, if any exist
116
-
- What value the changes adds to the project
117
-
118
-
Constraints:
119
-
120
-
- The human optimized markdown file should be 1-3 paragraphs max
121
-
- Agent optimized markdown should focus on explaining the changes efficiently
122
-
123
-
Use ArguSeek, learn how to explain and optimize both for humans and LLMs.
124
-
Use the code research to learn the overall architecture, module responsibilities and coding style.
125
-
```
105
+
<DualOptimizedPR />
126
106
127
107
### Mechanisms at Work
128
108
@@ -138,6 +118,10 @@ This sub-agent capability is unique to Claude Code CLI. Other tools (Codex, GitH
138
118
139
119
**Evidence requirements ([Lesson 7](./lesson-7-planning-execution.md#require-evidence-to-force-grounding)):** The prompt forces grounding through "explore the changes" and "learn the architecture"—the agent cannot draft accurate descriptions without reading actual commits and code.
140
120
121
+
:::tip Reference
122
+
See the complete prompt template with workflow integration tips: [Dual-Optimized PR Description](/prompts/pull-requests/dual-optimized-pr)
123
+
:::
124
+
141
125
### Reviewing PRs with AI Assistants
142
126
143
127
When you're on the receiving end of a PR with dual-optimized descriptions, you have structured context for both human understanding and AI-assisted review. This section shows how to leverage both descriptions effectively.
@@ -160,39 +144,11 @@ When you're on the receiving end of a PR with dual-optimized descriptions, you h
160
144
161
145
When reviewing a PR with dual-optimized descriptions, use this pattern with your AI assistant:
162
146
163
-
````markdown
164
-
You are {PROJECT_NAME}'s maintainer reviewing {PR_LINK}. Ensure code quality, prevent technical debt, and maintain architectural consistency.
165
-
166
-
Context from the PR author:
167
-
{PASTE_AI_OPTIMIZED_DESCRIPTION}
168
-
169
-
# Review Process
170
-
171
-
1. Use GitHub CLI to read the PR discussions, comments, and related issues
172
-
2. Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most. End the assessment with a separator ####.
173
-
3. Never speculate about code you haven't read - investigate files before commenting
174
-
175
-
# Critical Checks
176
-
177
-
Before approving, verify:
147
+
<AIAssistedReview />
178
148
179
-
- Can existing code be extended instead of creating new?
180
-
- Does this respect module boundaries and responsibilities?
181
-
- Are there similar patterns elsewhere? Search the codebase.
182
-
- Is this introducing duplication?
183
-
184
-
# Output Format
185
-
186
-
```markdown
187
-
**Summary**: [One sentence verdict]
188
-
**Strengths**: [2-3 items]
189
-
**Issues**: [By severity: Critical/Major/Minor with file:line refs]
0 commit comments