Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions .claude/commands/dev-checklists.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
---
description: Development checklists for code changes (params, methodology, warnings, reviews, bugs)
argument-hint: "[checklist-name]"
---

# Development Checklists

## Adding a New Parameter to Estimators

When adding a new `__init__` parameter that should be available across estimators:

1. **Implementation** (for each affected estimator):
- [ ] Add to `__init__` signature with default value
- [ ] Store as `self.param_name`
- [ ] Add to `get_params()` return dict
- [ ] Handle in `set_params()` (usually automatic via `hasattr`)

2. **Consistency** - apply to all applicable estimators per the **Estimator inheritance** map in CLAUDE.md

3. **Testing**:
- [ ] Test `get_params()` includes new param
- [ ] Test parameter affects estimator behavior
- [ ] Test with non-default value

4. **Downstream tracing**:
- [ ] Before implementing: `grep -rn "self\.<param>" diff_diff/<module>.py` to find ALL downstream paths
- [ ] Parameter handled in ALL aggregation methods (simple, event_study, group)
- [ ] Parameter handled in bootstrap inference paths

5. **Documentation**:
- [ ] Update docstring in all affected classes
- [ ] Update CLAUDE.md if it's a key design pattern

## Implementing Methodology-Critical Code

When implementing or modifying code that affects statistical methodology (estimators, SE calculation, inference, edge case handling):

1. **Before coding - consult the Methodology Registry**:
- [ ] Read the relevant estimator section in `docs/methodology/REGISTRY.md`
- [ ] Identify the reference implementation(s) listed
- [ ] Note the edge case handling requirements

2. **During implementation**:
- [ ] Follow the documented equations and formulas
- [ ] Match reference implementation behavior for standard cases
- [ ] For edge cases: either match reference OR document deviation

3. **When deviating from reference implementations**:
- [ ] Add a **Note** in the Methodology Registry explaining the deviation
- [ ] Include rationale (e.g., "defensive enhancement", "R errors here")
- [ ] Ensure the deviation is an improvement, not a bug

4. **Testing methodology-aligned behavior**:
- [ ] Test that edge cases produce documented behavior (NaN, warning, etc.)
- [ ] Assert warnings are raised (not just captured)
- [ ] Assert the warned-about behavior actually occurred
- [ ] For NaN results: assert `np.isnan()`, don't just check "no exception"
- [ ] All inference fields computed via `safe_inference()` (not inline)

## Adding Warning/Error/Fallback Handling

When adding code that emits warnings or handles errors:

1. **Consult Methodology Registry first**:
- [ ] Check if behavior is documented in edge cases section
- [ ] If not documented, add it before implementing

2. **Verify behavior matches message**:
- [ ] Manually trace the code path after warning/error
- [ ] Confirm the stated behavior actually occurs

3. **Write behavioral tests**:
- [ ] Don't just test "no exception raised"
- [ ] Assert the expected outcome occurred
- [ ] For fallbacks: verify fallback behavior was applied
- [ ] Example: If warning says "setting NaN", assert `np.any(np.isnan(result))`

4. **Protect arithmetic operations**:
- [ ] Wrap ALL related operations in `np.errstate()`, not just the final one
- [ ] Include division, matrix multiplication, and any operation that can overflow/underflow

## Reviewing New Features or Code Paths

When reviewing PRs that add new features, modes, or code paths (learned from PR #97 analysis):

1. **Edge Case Coverage**:
- [ ] Empty result sets (no matching data for a filter condition)
- [ ] NaN/Inf propagation through ALL inference fields (SE, t-stat, p-value, CI)
- [ ] Parameter interactions (e.g., new param x existing aggregation methods)
- [ ] Control/comparison group composition for all code paths

2. **Documentation Completeness**:
- [ ] All new parameters have docstrings with type, default, and description
- [ ] Methodology docs match implementation behavior (equations, edge cases)
- [ ] Edge cases documented in `docs/methodology/REGISTRY.md`

3. **Logic Audit for New Code Paths**:
- [ ] When adding new modes (like `base_period="varying"`), trace ALL downstream effects
- [ ] Check aggregation methods handle the new mode correctly
- [ ] Check bootstrap/inference methods handle the new mode correctly
- [ ] Explicitly test control group composition in new code paths

4. **Pattern Consistency**:
- [ ] Search for similar patterns in codebase (e.g., `t_stat = x / se if se > 0 else ...`)
- [ ] Ensure new code follows established patterns or updates ALL instances
- [ ] If fixing a pattern, grep for ALL occurrences first:
```bash
grep -n "if.*se.*> 0.*else" diff_diff/*.py
```

## Fixing Bugs Across Multiple Locations

When a bug fix involves a pattern that appears in multiple places (learned from PR #97 analysis):

1. **Find All Instances First**:
- [ ] Use grep/search to find ALL occurrences of the pattern before fixing
- [ ] Document the locations found (file:line)
- [ ] Example: `t_stat = effect / se if se > 0 else 0.0` appeared in 7 locations

2. **Fix Comprehensively in One Round**:
- [ ] Fix ALL instances in the same PR/commit
- [ ] Create a test that covers all locations
- [ ] Don't fix incrementally across multiple review rounds

3. **Regression Test the Fix**:
- [ ] Verify fix doesn't break other code paths
- [ ] For early-return fixes: ensure downstream code still runs when needed
- [ ] Example: Bootstrap early return must still compute per-effect SEs

4. **Common Patterns to Watch For**:
- `if se > 0 else 0.0` -> should be `else np.nan` for undefined statistics
- `if len(data) > 0 else return` -> check what downstream code expects
- `mask = (condition)` -> verify mask logic for all parameter combinations

## Pre-Merge Review Checklist

Final checklist before approving a PR:

1. **Behavioral Completeness**:
- [ ] Happy path tested
- [ ] Edge cases tested (empty data, NaN inputs, boundary conditions)
- [ ] Error/warning paths tested with behavioral assertions

2. **Inference Field Consistency**:
- [ ] If one inference field (SE, t-stat, p-value) can be NaN, all related fields handle it
- [ ] Aggregation methods propagate NaN correctly
- [ ] Bootstrap methods handle NaN in base estimates

3. **Documentation Sync**:
- [ ] Docstrings updated for all changed signatures
- [ ] README updated if user-facing behavior changes
- [ ] REGISTRY.md updated if methodology edge cases change

## Quick Reference: Common Patterns to Check

Before submitting methodology changes, verify these patterns:

```bash
# Find potential NaN handling issues (should use np.nan, not 0.0)
grep -n "if.*se.*>.*0.*else 0" diff_diff/*.py

# Find all t_stat calculations to ensure consistency
grep -n "t_stat.*=" diff_diff/*.py

# Find all inference field assignments
grep -n "\(se\|t_stat\|p_value\|ci_lower\|ci_upper\).*=" diff_diff/*.py | head -30
```
2 changes: 1 addition & 1 deletion .claude/commands/docs-check.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,6 @@ Summary: 15/18 checks passed, 3 issues found
## Notes

- This check is especially important after adding new estimators
- The CLAUDE.md file documents what documentation is required for new features
- The CONTRIBUTING.md file documents what documentation is required for new features
- Missing references should cite the original methodology paper, not textbooks
- When adding new estimators, update this skill's tables accordingly
15 changes: 8 additions & 7 deletions .claude/commands/review-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,12 @@ If no prior review is available from either source (conversation context or revi
### Step 2: Read CLAUDE.md for Project Context

Read the project's `CLAUDE.md` file to understand:
- Module structure and architecture
- Key design patterns (sklearn-like API, formula interface, results objects, etc.)
- Estimator inheritance map
- Development checklists (adding parameters, methodology-critical code, etc.)
- Test structure and conventions
- Design patterns (sklearn-like API, formula interface, results objects, etc.)
- Testing conventions
- Key reference file pointers (methodology registry, development checklists, etc.)

Also read `.claude/commands/dev-checklists.md` for development checklists.

If the plan modifies estimator math, standard error formulas, inference logic, or edge-case handling, also read `docs/methodology/REGISTRY.md` to understand the academic foundations and reference implementations for the affected estimator(s).

Expand Down Expand Up @@ -224,7 +225,7 @@ Check for **missing related changes**:
- Tests for new/changed functionality
- `__init__.py` export updates
- `get_params()` / `set_params()` updates for new parameters
- Documentation updates (README, RST, tutorials, CLAUDE.md)
- Documentation updates (README, RST, tutorials, CONTRIBUTING.md, CLAUDE.md if design patterns change)
- For bug fixes: did the plan grep for ALL occurrences of the pattern, or just the one reported?

Check for **unnecessary additions**:
Expand Down Expand Up @@ -354,9 +355,9 @@ Present the review in the following format. Number each issue sequentially withi

## Checklist Gaps

Cross-reference against the relevant CLAUDE.md checklists. List which checklist items are not addressed by the plan.
Cross-reference against the relevant development checklists in `.claude/commands/dev-checklists.md`. List which checklist items are not addressed by the plan.

[Identify which CLAUDE.md checklist applies (e.g., "Adding a New Parameter to Estimators", "Implementing Methodology-Critical Code", "Fixing Bugs Across Multiple Locations") and list any items from that checklist that the plan doesn't cover.]
[Identify which checklist applies (e.g., "Adding a New Parameter to Estimators", "Implementing Methodology-Critical Code", "Fixing Bugs Across Multiple Locations") and list any items from that checklist that the plan doesn't cover.]

**Registry Alignment** (if methodology files changed):
- [ ] Plan equations match REGISTRY.md (or deviations documented)
Expand Down
4 changes: 2 additions & 2 deletions .claude/hooks/check-plan-review.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# - A stale sentinel can approve an unreviewed newer plan. This occurs only when:
# (a) the sentinel points to an older plan with a valid review, AND
# (b) a newer plan was created without updating the sentinel.
# CLAUDE.md instructs sentinel updates on new plan creation (line 647), making (b)
# CLAUDE.md instructs sentinel updates on new plan creation (Plan Review section), making (b)
# unlikely in practice. A runtime guard was tested and removed (cb8d5e3) because it
# caused false denials in multi-plan workflows. Automated tests in
# test-check-plan-review.sh verify existing behavior; full mitigation would require
Expand Down Expand Up @@ -102,7 +102,7 @@ if [ -f "$REVIEW_FILE" ]; then
fi
exit 0 # Review exists and is fresh, allow
else
deny "No plan review found. Expected: $REVIEW_FILE. Follow the Plan Review Before Approval instructions in CLAUDE.md."
deny "No plan review found. Expected: $REVIEW_FILE. Run the plan review workflow: offer review via AskUserQuestion, spawn /review-plan if requested, or write a Skipped marker. See CLAUDE.md Plan Review section."
fi

# Manual verification checklist:
Expand Down
Loading