From a0844375aa91e2e62a7b27858dc64505cf879eeb Mon Sep 17 00:00:00 2001 From: Jeremy Eder Date: Fri, 21 Nov 2025 15:34:48 -0500 Subject: [PATCH 1/8] feat: Add Repomix integration for AI-friendly repository context generation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implements comprehensive Repomix integration as specified in coldstart prompt 008-repomix-integration.md. This feature enables automated repository context generation for AI consumption and improves AI-assisted development workflows. ## Features Added ### Core Components - **RepomixService** (`src/agentready/services/repomix.py`) - Configuration generation (repomix.config.json) - Ignore file generation (.repomixignore) - Repomix execution wrapper with error handling - Freshness checking (7-day default staleness threshold) - Output file management and discovery - **CLI Command** (`src/agentready/cli/repomix.py`) - `agentready repomix-generate` - Main command - `--init` - Initialize Repomix configuration - `--format` - Output format selection (markdown/xml/json/plain) - `--check` - Verify output freshness without regeneration - `--max-age` - Configurable staleness threshold - **Bootstrap Integration** (`src/agentready/cli/bootstrap.py`) - Added `--repomix` flag to bootstrap command - Auto-generates repomix.config.json and .repomixignore - Creates GitHub Actions workflow for automation - **GitHub Actions Workflow** (`src/agentready/templates/bootstrap/workflows/repomix-update.yml.j2`) - Auto-regenerates on push to main and PRs - Weekly scheduled runs (Mondays 9 AM UTC) - Manual trigger support via workflow_dispatch - PR comments when Repomix output changes - **Repomix Assessor** (`src/agentready/assessors/repomix.py`) - Tier 3 attribute (weight: 0.02) - Checks for configuration file existence - Validates output freshness (< 7 days) - Provides detailed remediation guidance ### Testing - Comprehensive unit tests (21 test cases) - RepomixService tests with mocking - Installation detection - Config/ignore generation - Freshness checks - Command execution - RepomixConfigAssessor tests - Multiple assessment scenarios - Pass/fail/partial compliance ### Documentation - Updated repomix-output.md (1.8M, 420k tokens, 156 files) - AgentReady self-assessment: **80.0/100 (Gold)** πŸ₯‡ ## Technical Details ### Architecture - Follows existing AgentReady patterns - Strategy pattern for assessor - Service layer for business logic - Template-based workflow generation ### Integration Points - Registered in main CLI (`src/agentready/cli/main.py`) - Added to bootstrap generator (`src/agentready/services/bootstrap.py`) - Included in assessor list (Tier 3 Important) ### Configuration Management - Smart defaults for Python projects - Customizable ignore patterns - Aligned with existing .gitignore patterns - Security scanning enabled by default ## Use Cases ```bash # Initialize Repomix for repository agentready repomix-generate --init # Generate AI-friendly context agentready repomix-generate # Bootstrap new repo with Repomix agentready bootstrap --repomix # Check if output is fresh agentready repomix-generate --check ``` ## Related - Coldstart Prompt: `coldstart-prompts/08-repomix-integration.md` - Priority: P4 (Enhancement) - Category: AI-Assisted Development Tools πŸ€– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- repomix-output.md | 58980 +++++++++++----- src/agentready/assessors/repomix.py | 138 + src/agentready/cli/bootstrap.py | 4 +- src/agentready/cli/repomix.py | 159 + src/agentready/reporters/markdown.py | 6 +- src/agentready/services/repomix.py | 269 + .../bootstrap/workflows/repomix-update.yml.j2 | 103 + tests/unit/test_repomix.py | 212 + tests/unit/test_security.py | 27 +- uv.lock | 2 +- 10 files changed, 42624 insertions(+), 17276 deletions(-) create mode 100644 src/agentready/assessors/repomix.py create mode 100644 src/agentready/cli/repomix.py create mode 100644 src/agentready/services/repomix.py create mode 100644 src/agentready/templates/bootstrap/workflows/repomix-update.yml.j2 create mode 100644 tests/unit/test_repomix.py diff --git a/repomix-output.md b/repomix-output.md index 1f1819a..24f8830 100644 --- a/repomix-output.md +++ b/repomix-output.md @@ -45,26 +45,21 @@ The content is organized as follows: speckit.specify.md speckit.tasks.md speckit.taskstoissues.md - settings.local.json .github/ - coldstart-prompts/ - 01-create-automated-demo.md - 02-fix-critical-security-logic-bugs-from-code-review.md - 03-bootstrap-agentready-repository-on-github.md - 04-report-header-with-repository-metadata.md - 05-improve-html-report-design-font-size-color-scheme.md - 06-report-schema-versioning.md - 07-research-report-generatorupdater-utility.md - 08-repomix-integration.md - 09-agentready-repository-agent.md - 10-customizable-html-report-themes.md - 11-fix-code-quality-issues-from-code-review.md - 12-improve-test-coverage-and-edge-case-handling.md - 13-add-security-quality-improvements-from-code-review.md - 14-align-subcommand-automated-remediation.md - 15-interactive-dashboard-with-automated-remediation.md - 16-github-app-integration-badge-status-checks.md - README.md + ISSUE_TEMPLATE/ + bug_report.md + feature_request.md + workflows/ + agentready-assessment.yml + claude-code-action.yml + docs-lint.yml + release.yml + security.yml + tests.yml + update-docs.yml + CODEOWNERS + dependabot.yml + PULL_REQUEST_TEMPLATE.md .specify/ memory/ constitution.md @@ -81,15 +76,55 @@ The content is organized as follows: plan-template.md spec-template.md tasks-template.md +coldstart-prompts/ + 01-create-automated-demo.md + 02-fix-critical-security-logic-bugs-from-code-review.md + 03-bootstrap-agentready-repository-on-github.md + 04-report-header-with-repository-metadata.md + 05-improve-html-report-design-font-size-color-scheme.md + 06-report-schema-versioning.md + 07-research-report-generatorupdater-utility.md + 08-repomix-integration.md + 09-agentready-repository-agent.md + 10-customizable-html-report-themes.md + 11-fix-code-quality-issues-from-code-review.md + 12-improve-test-coverage-and-edge-case-handling.md + 13-add-security-quality-improvements-from-code-review.md + 14-align-subcommand-automated-remediation.md + 15-interactive-dashboard-with-automated-remediation.md + 16-github-app-integration-badge-status-checks.md + 17-add-bootstrap-quickstart-to-readme.md + 18-setup-release-pipeline.md + 19-github-pages-linter-integration.md + README.md docs/ + _layouts/ + default.html + home.html + page.html + assets/ + css/ + style.css _config.yml + api-reference.md + attributes.md + DEPLOYMENT.md developer-guide.md + examples.md + Gemfile index.md + README.md + RELEASE_PROCESS.md + roadmaps.md + SETUP_SUMMARY.md user-guide.md examples/ self-assessment/ + assessment-20251121-035845.json assessment-20251121.json README.md + report-20251121-035845.html + report-20251121-035845.md report-20251121.html report-20251121.md scripts/ @@ -182,2996 +217,4102 @@ tests/ __init__.py .agentready-config.example.yaml .gitignore +.markdown-link-check.json +.pre-commit-config.yaml +.releaserc.json .repomixignore agent-ready-codebase-attributes.md BACKLOG.md +CHANGELOG.md CLAUDE.md +CODE_OF_CONDUCT.md +CONTRIBUTING.md GITHUB_ISSUES.md +LICENSE pyproject.toml README.md +repomix-output.xml repomix.config.json repos.txt ``` # Files -## File: .claude/settings.local.json -````json -{ - "permissions": { - "allow": [ - "Bash(tree:*)", - "Bash(agentready bootstrap:*)" - ], - "deny": [], - "ask": [] - } -} -```` - -## File: docs/developer-guide.md +## File: .claude/commands/speckit.analyze.md ````markdown --- -layout: page -title: Developer Guide +description: Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation. --- -# Developer Guide - -Comprehensive guide for contributors and developers extending AgentReady. - -## Table of Contents +## User Input -- [Getting Started](#getting-started) -- [Development Environment](#development-environment) -- [Architecture Overview](#architecture-overview) -- [Implementing New Assessors](#implementing-new-assessors) -- [Testing Guidelines](#testing-guidelines) -- [Code Quality Standards](#code-quality-standards) -- [Contributing Workflow](#contributing-workflow) -- [Release Process](#release-process) +```text +$ARGUMENTS +``` ---- +You **MUST** consider the user input before proceeding (if not empty). -## Getting Started +## Goal -### Prerequisites +Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/speckit.tasks` has successfully produced a complete `tasks.md`. -- **Python 3.11 or 3.12** -- **Git** -- **uv** or **pip** (uv recommended for faster dependency management) -- **Make** (optional, for convenience commands) +## Operating Constraints -### Fork and Clone +**STRICTLY READ-ONLY**: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually). -```bash -# Fork on GitHub first, then: -git clone https://github.com/YOUR_USERNAME/agentready.git -cd agentready +**Constitution Authority**: The project constitution (`.specify/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasksβ€”not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `/speckit.analyze`. -# Add upstream remote -git remote add upstream https://github.com/yourusername/agentready.git -``` +## Execution Steps -### Install Development Dependencies +### 1. Initialize Analysis Context -```bash -# Create virtual environment -python3 -m venv .venv -source .venv/bin/activate # On Windows: .venv\Scripts\activate +Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` once from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS. Derive absolute paths: -# Install with development dependencies -uv pip install -e ".[dev]" +- SPEC = FEATURE_DIR/spec.md +- PLAN = FEATURE_DIR/plan.md +- TASKS = FEATURE_DIR/tasks.md -# Or using pip -pip install -e ".[dev]" +Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command). +For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). -# Verify installation -pytest --version -black --version -ruff --version -``` +### 2. Load Artifacts (Progressive Disclosure) ---- +Load only the minimal necessary context from each artifact: -## Development Environment +**From spec.md:** -### Project Structure +- Overview/Context +- Functional Requirements +- Non-Functional Requirements +- User Stories +- Edge Cases (if present) -``` -agentready/ -β”œβ”€β”€ src/agentready/ # Source code -β”‚ β”œβ”€β”€ cli/ # Click-based CLI -β”‚ β”‚ └── main.py # Entry point (assess, research-version, generate-config) -β”‚ β”œβ”€β”€ models/ # Data models -β”‚ β”‚ β”œβ”€β”€ repository.py # Repository representation -β”‚ β”‚ β”œβ”€β”€ attribute.py # Attribute definition -β”‚ β”‚ β”œβ”€β”€ finding.py # Assessment finding -β”‚ β”‚ └── assessment.py # Complete assessment result -β”‚ β”œβ”€β”€ services/ # Core business logic -β”‚ β”‚ β”œβ”€β”€ scanner.py # Assessment orchestration -β”‚ β”‚ β”œβ”€β”€ scorer.py # Score calculation -β”‚ β”‚ └── language_detector.py # Language detection via git -β”‚ β”œβ”€β”€ assessors/ # Attribute assessors -β”‚ β”‚ β”œβ”€β”€ base.py # BaseAssessor abstract class -β”‚ β”‚ β”œβ”€β”€ documentation.py # CLAUDE.md, README assessors -β”‚ β”‚ β”œβ”€β”€ code_quality.py # Type annotations, complexity -β”‚ β”‚ β”œβ”€β”€ testing.py # Test coverage, pre-commit hooks -β”‚ β”‚ β”œβ”€β”€ structure.py # Standard layout, gitignore -β”‚ β”‚ └── stub_assessors.py # 15 not-yet-implemented assessors -β”‚ β”œβ”€β”€ reporters/ # Report generators -β”‚ β”‚ β”œβ”€β”€ html.py # Interactive HTML with Jinja2 -β”‚ β”‚ β”œβ”€β”€ markdown.py # GitHub-Flavored Markdown -β”‚ β”‚ └── json.py # Machine-readable JSON -β”‚ β”œβ”€β”€ templates/ # Jinja2 templates -β”‚ β”‚ └── report.html.j2 # HTML report template -β”‚ └── data/ # Bundled data -β”‚ └── attributes.yaml # Attribute definitions -β”œβ”€β”€ tests/ # Test suite -β”‚ β”œβ”€β”€ unit/ # Unit tests (fast, isolated) -β”‚ β”‚ β”œβ”€β”€ test_models.py -β”‚ β”‚ β”œβ”€β”€ test_assessors_documentation.py -β”‚ β”‚ β”œβ”€β”€ test_assessors_code_quality.py -β”‚ β”‚ └── ... -β”‚ β”œβ”€β”€ integration/ # End-to-end tests -β”‚ β”‚ └── test_full_assessment.py -β”‚ └── fixtures/ # Test data -β”‚ └── sample_repos/ # Sample repositories for testing -β”œβ”€β”€ docs/ # GitHub Pages documentation -β”œβ”€β”€ examples/ # Example reports -β”‚ └── self-assessment/ # AgentReady's own assessment -β”œβ”€β”€ pyproject.toml # Python package configuration -β”œβ”€β”€ CLAUDE.md # Project context for AI agents -β”œβ”€β”€ README.md # User-facing documentation -└── BACKLOG.md # Feature backlog -``` +**From plan.md:** -### Development Tools +- Architecture/stack choices +- Data Model references +- Phases +- Technical constraints -AgentReady uses modern Python tooling: +**From tasks.md:** -| Tool | Purpose | Configuration | -|------|---------|---------------| -| **pytest** | Testing framework | `pyproject.toml` | -| **black** | Code formatter | `pyproject.toml` | -| **isort** | Import sorter | `pyproject.toml` | -| **ruff** | Fast linter | `pyproject.toml` | -| **mypy** | Type checker | `pyproject.toml` (future) | +- Task IDs +- Descriptions +- Phase grouping +- Parallel markers [P] +- Referenced file paths -### Running Tests +**From constitution:** -```bash -# Run all tests -pytest +- Load `.specify/memory/constitution.md` for principle validation -# Run with coverage -pytest --cov=src/agentready --cov-report=html +### 3. Build Semantic Models -# Run specific test file -pytest tests/unit/test_models.py -v +Create internal representations (do not include raw artifacts in output): -# Run tests matching pattern -pytest -k "test_claude_md" -v +- **Requirements inventory**: Each functional + non-functional requirement with a stable key (derive slug based on imperative phrase; e.g., "User can upload file" β†’ `user-can-upload-file`) +- **User story/action inventory**: Discrete user actions with acceptance criteria +- **Task coverage mapping**: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases) +- **Constitution rule set**: Extract principle names and MUST/SHOULD normative statements -# Run with output (don't capture print statements) -pytest -s +### 4. Detection Passes (Token-Efficient Analysis) -# Fast fail (stop on first failure) -pytest -x -``` +Focus on high-signal findings. Limit to 50 findings total; aggregate remainder in overflow summary. -### Code Quality Checks +#### A. Duplication Detection -```bash -# Format code -black src/ tests/ +- Identify near-duplicate requirements +- Mark lower-quality phrasing for consolidation -# Sort imports -isort src/ tests/ +#### B. Ambiguity Detection -# Lint code -ruff check src/ tests/ +- Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria +- Flag unresolved placeholders (TODO, TKTK, ???, ``, etc.) -# Run all quality checks (recommended before committing) -black src/ tests/ && isort src/ tests/ && ruff check src/ tests/ -``` +#### C. Underspecification -### Pre-commit Hooks (Recommended) +- Requirements with verbs but missing object or measurable outcome +- User stories missing acceptance criteria alignment +- Tasks referencing files or components not defined in spec/plan -Install pre-commit hooks to automatically run quality checks: +#### D. Constitution Alignment -```bash -# Install pre-commit (if not already installed) -pip install pre-commit +- Any requirement or plan element conflicting with a MUST principle +- Missing mandated sections or quality gates from constitution -# Install git hooks -pre-commit install +#### E. Coverage Gaps -# Run manually on all files -pre-commit run --all-files -``` +- Requirements with zero associated tasks +- Tasks with no mapped requirement/story +- Non-functional requirements not reflected in tasks (e.g., performance, security) ---- +#### F. Inconsistency -## Architecture Overview +- Terminology drift (same concept named differently across files) +- Data entities referenced in plan but absent in spec (or vice versa) +- Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note) +- Conflicting requirements (e.g., one requires Next.js while other specifies Vue) -AgentReady follows a **library-first architecture** with clear separation of concerns. +### 5. Severity Assignment -### Data Flow +Use this heuristic to prioritize findings: -``` -Repository β†’ Scanner β†’ Assessors β†’ Findings β†’ Assessment β†’ Reporters β†’ Reports - ↓ - Language Detection - (git ls-files) -``` +- **CRITICAL**: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality +- **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion +- **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case +- **LOW**: Style/wording improvements, minor redundancy not affecting execution order -### Core Components +### 6. Produce Compact Analysis Report -#### 1. Models (`models/`) +Output a Markdown report (no file writes) with the following structure: -Immutable data classes representing domain entities: +## Specification Analysis Report -- **Repository**: Path, name, detected languages -- **Attribute**: ID, name, tier, weight, description -- **Finding**: Attribute, status (pass/fail/skip), score, evidence, remediation -- **Assessment**: Repository, overall score, certification level, findings list +| ID | Category | Severity | Location(s) | Summary | Recommendation | +|----|----------|----------|-------------|---------|----------------| +| A1 | Duplication | HIGH | spec.md:L120-134 | Two similar requirements ... | Merge phrasing; keep clearer version | -**Design Principles**: -- Immutable (frozen dataclasses) -- Type-annotated -- No business logic (pure data) -- Factory methods for common patterns (`Finding.create_pass()`, etc.) +(Add one row per finding; generate stable IDs prefixed by category initial.) -#### 2. Services (`services/`) +**Coverage Summary Table:** -Orchestration and core algorithms: +| Requirement Key | Has Task? | Task IDs | Notes | +|-----------------|-----------|----------|-------| -- **Scanner**: Coordinates assessment flow, manages assessors -- **Scorer**: Calculates weighted scores, determines certification levels -- **LanguageDetector**: Detects repository languages via `git ls-files` +**Constitution Alignment Issues:** (if any) -**Design Principles**: -- Stateless (pure functions or stateless classes) -- Single responsibility -- No external dependencies (file I/O, network) -- Testable with mocks +**Unmapped Tasks:** (if any) -#### 3. Assessors (`assessors/`) +**Metrics:** -Strategy pattern implementations for each attribute: +- Total Requirements +- Total Tasks +- Coverage % (requirements with >=1 task) +- Ambiguity Count +- Duplication Count +- Critical Issues Count -- **BaseAssessor**: Abstract base class defining interface -- Concrete assessors: `CLAUDEmdAssessor`, `READMEAssessor`, etc. +### 7. Provide Next Actions -**Design Principles**: -- Each assessor is independent -- Inherit from `BaseAssessor` -- Implement `assess(repository)` method -- Return `Finding` object -- Fail gracefully (return "skipped" if tools missing, don't crash) +At end of report, output a concise Next Actions block: -#### 4. Reporters (`reporters/`) +- If CRITICAL issues exist: Recommend resolving before `/speckit.implement` +- If only LOW/MEDIUM: User may proceed, but provide improvement suggestions +- Provide explicit command suggestions: e.g., "Run /speckit.specify with refinement", "Run /speckit.plan to adjust architecture", "Manually edit tasks.md to add coverage for 'performance-metrics'" -Transform `Assessment` into report formats: +### 8. Offer Remediation -- **HTMLReporter**: Jinja2-based interactive report -- **MarkdownReporter**: GitHub-Flavored Markdown -- **JSONReporter**: Machine-readable JSON +Ask the user: "Would you like me to suggest concrete remediation edits for the top N issues?" (Do NOT apply them automatically.) -**Design Principles**: -- Take `Assessment` as input -- Return formatted string -- Self-contained (HTML has inline CSS/JS, no CDN) -- Idempotent (same input β†’ same output) +## Operating Principles -### Key Design Patterns +### Context Efficiency -#### Strategy Pattern (Assessors) +- **Minimal high-signal tokens**: Focus on actionable findings, not exhaustive documentation +- **Progressive disclosure**: Load artifacts incrementally; don't dump all content into analysis +- **Token-efficient output**: Limit findings table to 50 rows; summarize overflow +- **Deterministic results**: Rerunning without changes should produce consistent IDs and counts -Each assessor is a pluggable strategy implementing the same interface: +### Analysis Guidelines -```python -from abc import ABC, abstractmethod +- **NEVER modify files** (this is read-only analysis) +- **NEVER hallucinate missing sections** (if absent, report them accurately) +- **Prioritize constitution violations** (these are always CRITICAL) +- **Use examples over exhaustive rules** (cite specific instances, not generic patterns) +- **Report zero issues gracefully** (emit success report with coverage statistics) -class BaseAssessor(ABC): - @property - @abstractmethod - def attribute_id(self) -> str: - """Unique attribute identifier.""" - pass +## Context - @abstractmethod - def assess(self, repository: Repository) -> Finding: - """Assess repository for this attribute.""" - pass +$ARGUMENTS +```` - def is_applicable(self, repository: Repository) -> bool: - """Check if this assessor applies to the repository.""" - return True -``` +## File: .claude/commands/speckit.checklist.md +````markdown +--- +description: Generate a custom checklist for the current feature based on user requirements. +--- -#### Factory Pattern (Finding Creation) +## Checklist Purpose: "Unit Tests for English" -`Finding` class provides factory methods for common patterns: +**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain. -```python -# Pass with full score -finding = Finding.create_pass( - attribute=attribute, - evidence="Found CLAUDE.md at repository root", - remediation=None -) +**NOT for verification/testing**: -# Fail with zero score -finding = Finding.create_fail( - attribute=attribute, - evidence="No CLAUDE.md file found", - remediation=Remediation(steps=[...], tools=[...]) -) +- ❌ NOT "Verify the button clicks correctly" +- ❌ NOT "Test error handling works" +- ❌ NOT "Confirm the API returns 200" +- ❌ NOT checking if code/implementation matches the spec -# Skip (not applicable) -finding = Finding.create_skip( - attribute=attribute, - reason="Not implemented yet" -) -``` +**FOR requirements quality validation**: -#### Template Pattern (Reporters) +- βœ… "Are visual hierarchy requirements defined for all card types?" (completeness) +- βœ… "Is 'prominent display' quantified with specific sizing/positioning?" (clarity) +- βœ… "Are hover state requirements consistent across all interactive elements?" (consistency) +- βœ… "Are accessibility requirements defined for keyboard navigation?" (coverage) +- βœ… "Does the spec define what happens when logo image fails to load?" (edge cases) -Reporters use Jinja2 templates for HTML generation: +**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works. -```python -from jinja2 import Environment, FileSystemLoader +## User Input -class HTMLReporter: - def generate(self, assessment: Assessment) -> str: - env = Environment(loader=FileSystemLoader('templates')) - template = env.get_template('report.html.j2') - return template.render(assessment=assessment) +```text +$ARGUMENTS ``` ---- - -## Implementing New Assessors +You **MUST** consider the user input before proceeding (if not empty). -Follow this step-by-step guide to add a new assessor. +## Execution Steps -### Step 1: Choose an Attribute +1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS list. + - All file paths must be absolute. + - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). -Check `src/agentready/assessors/stub_assessors.py` for not-yet-implemented attributes: +2. **Clarify intent (dynamic)**: Derive up to THREE initial contextual clarifying questions (no pre-baked catalog). They MUST: + - Be generated from the user's phrasing + extracted signals from spec/plan/tasks + - Only ask about information that materially changes checklist content + - Be skipped individually if already unambiguous in `$ARGUMENTS` + - Prefer precision over breadth -```python -# Example stub assessor -class InlineDocumentationAssessor(BaseAssessor): - @property - def attribute_id(self) -> str: - return "inline_documentation" + Generation algorithm: + 1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts"). + 2. Cluster signals into candidate focus areas (max 4) ranked by relevance. + 3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit. + 4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria. + 5. Formulate questions chosen from these archetypes: + - Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?") + - Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?") + - Depth calibration (e.g., "Is this a lightweight pre-commit sanity list or a formal release gate?") + - Audience framing (e.g., "Will this be used by the author only or peers during PR review?") + - Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?") + - Scenario class gap (e.g., "No recovery flows detectedβ€”are rollback / partial failure paths in scope?") - def assess(self, repository: Repository) -> Finding: - # TODO: Implement actual assessment logic - return Finding.create_skip( - self.attribute, - reason="Assessor not yet implemented" - ) -``` + Question formatting rules: + - If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters + - Limit to A–E options maximum; omit table if a free-form answer is clearer + - Never ask the user to restate what they already said + - Avoid speculative categories (no hallucination). If uncertain, ask explicitly: "Confirm whether X belongs in scope." -### Step 2: Create Assessor Class + Defaults when interaction impossible: + - Depth: Standard + - Audience: Reviewer (PR) if code-related; Author otherwise + - Focus: Top 2 relevance clusters -Create a new file or expand existing category file in `src/agentready/assessors/`: + Output the questions (label Q1/Q2/Q3). After answers: if β‰₯2 scenario classes (Alternate / Exception / Recovery / Non-Functional domain) remain unclear, you MAY ask up to TWO more targeted follow‑ups (Q4/Q5) with a one-line justification each (e.g., "Unresolved recovery path risk"). Do not exceed five total questions. Skip escalation if user explicitly declines more. -```python -# src/agentready/assessors/documentation.py +3. **Understand user request**: Combine `$ARGUMENTS` + clarifying answers: + - Derive checklist theme (e.g., security, review, deploy, ux) + - Consolidate explicit must-have items mentioned by user + - Map focus selections to category scaffolding + - Infer any missing context from spec/plan/tasks (do NOT hallucinate) -from agentready.models import Repository, Finding, Attribute, Remediation -from agentready.assessors.base import BaseAssessor +4. **Load feature context**: Read from FEATURE_DIR: + - spec.md: Feature requirements and scope + - plan.md (if exists): Technical details, dependencies + - tasks.md (if exists): Implementation tasks -class InlineDocumentationAssessor(BaseAssessor): - @property - def attribute_id(self) -> str: - return "inline_documentation" + **Context Loading Strategy**: + - Load only necessary portions relevant to active focus areas (avoid full-file dumping) + - Prefer summarizing long sections into concise scenario/requirement bullets + - Use progressive disclosure: add follow-on retrieval only if gaps detected + - If source docs are large, generate interim summary items instead of embedding raw text - def assess(self, repository: Repository) -> Finding: - """ - Assess inline documentation coverage (docstrings/JSDoc). +5. **Generate checklist** - Create "Unit Tests for Requirements": + - Create `FEATURE_DIR/checklists/` directory if it doesn't exist + - Generate unique checklist filename: + - Use short, descriptive name based on domain (e.g., `ux.md`, `api.md`, `security.md`) + - Format: `[domain].md` + - If file exists, append to existing file + - Number items sequentially starting from CHK001 + - Each `/speckit.checklist` run creates a NEW file (never overwrites existing checklists) - Checks: - - Python: Presence of docstrings in .py files - - JavaScript/TypeScript: JSDoc comments - - Coverage: >80% of public functions documented - """ - # Implement assessment logic here - pass -``` + **CORE PRINCIPLE - Test the Requirements, Not the Implementation**: + Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for: + - **Completeness**: Are all necessary requirements present? + - **Clarity**: Are requirements unambiguous and specific? + - **Consistency**: Do requirements align with each other? + - **Measurability**: Can requirements be objectively verified? + - **Coverage**: Are all scenarios/edge cases addressed? -### Step 3: Implement Assessment Logic + **Category Structure** - Group items by requirement quality dimensions: + - **Requirement Completeness** (Are all necessary requirements documented?) + - **Requirement Clarity** (Are requirements specific and unambiguous?) + - **Requirement Consistency** (Do requirements align without conflicts?) + - **Acceptance Criteria Quality** (Are success criteria measurable?) + - **Scenario Coverage** (Are all flows/cases addressed?) + - **Edge Case Coverage** (Are boundary conditions defined?) + - **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?) + - **Dependencies & Assumptions** (Are they documented and validated?) + - **Ambiguities & Conflicts** (What needs clarification?) -Use the `calculate_proportional_score()` helper for partial compliance: + **HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**: -```python -def assess(self, repository: Repository) -> Finding: - # Example: Check Python docstrings - if "Python" not in repository.languages: - return Finding.create_skip( - self.attribute, - reason="No Python files detected" - ) + ❌ **WRONG** (Testing implementation): + - "Verify landing page displays 3 episode cards" + - "Test hover states work on desktop" + - "Confirm logo click navigates home" - # Count functions and docstrings - total_functions = self._count_functions(repository) - documented_functions = self._count_documented_functions(repository) + βœ… **CORRECT** (Testing requirements quality): + - "Are the exact number and layout of featured episodes specified?" [Completeness] + - "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity] + - "Are hover state requirements consistent across all interactive elements?" [Consistency] + - "Are keyboard navigation requirements defined for all interactive UI?" [Coverage] + - "Is the fallback behavior specified when logo image fails to load?" [Edge Cases] + - "Are loading states defined for asynchronous episode data?" [Completeness] + - "Does the spec define visual hierarchy for competing UI elements?" [Clarity] - if total_functions == 0: - return Finding.create_skip( - self.attribute, - reason="No functions found" - ) + **ITEM STRUCTURE**: + Each item should follow this pattern: + - Question format asking about requirement quality + - Focus on what's WRITTEN (or not written) in the spec/plan + - Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.] + - Reference spec section `[Spec Β§X.Y]` when checking existing requirements + - Use `[Gap]` marker when checking for missing requirements - # Calculate coverage - coverage = documented_functions / total_functions - score = self.calculate_proportional_score(coverage, 0.80) + **EXAMPLES BY QUALITY DIMENSION**: - if score >= 80: # Passes if >= 80% of target - return Finding.create_pass( - self.attribute, - evidence=f"Documented {documented_functions}/{total_functions} functions ({coverage:.1%})", - remediation=None - ) - else: - return Finding.create_fail( - self.attribute, - evidence=f"Only {documented_functions}/{total_functions} functions documented ({coverage:.1%})", - remediation=self._create_remediation(coverage) - ) + Completeness: + - "Are error handling requirements defined for all API failure modes? [Gap]" + - "Are accessibility requirements specified for all interactive elements? [Completeness]" + - "Are mobile breakpoint requirements defined for responsive layouts? [Gap]" -def _count_functions(self, repository: Repository) -> int: - """Count total functions in Python files.""" - # Implementation using ast or grep - pass + Clarity: + - "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec Β§NFR-2]" + - "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec Β§FR-5]" + - "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec Β§FR-4]" -def _count_documented_functions(self, repository: Repository) -> int: - """Count functions with docstrings.""" - # Implementation using ast - pass + Consistency: + - "Do navigation requirements align across all pages? [Consistency, Spec Β§FR-10]" + - "Are card component requirements consistent between landing and detail pages? [Consistency]" -def _create_remediation(self, current_coverage: float) -> Remediation: - """Generate remediation guidance.""" - return Remediation( - steps=[ - "Install pydocstyle: `pip install pydocstyle`", - "Run docstring linter: `pydocstyle src/`", - "Add docstrings to flagged functions", - f"Target: {(0.80 - current_coverage) * 100:.0f}% more functions need documentation" - ], - tools=["pydocstyle", "pylint"], - commands=[ - "pydocstyle src/", - "pylint --disable=all --enable=missing-docstring src/" - ], - examples=[ - '''def calculate_total(items: List[Item]) -> float: - """ - Calculate total price of items. + Coverage: + - "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]" + - "Are concurrent user interaction scenarios addressed? [Coverage, Gap]" + - "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]" - Args: - items: List of items to sum + Measurability: + - "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec Β§FR-1]" + - "Can 'balanced visual weight' be objectively verified? [Measurability, Spec Β§FR-2]" - Returns: - Total price in USD + **Scenario Classification & Coverage** (Requirements Quality Focus): + - Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios + - For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?" + - If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]" + - Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]" - Example: - >>> calculate_total([Item(5.0), Item(3.0)]) - 8.0 - """ - return sum(item.price for item in items)''' - ], - citations=[ - "PEP 257 - Docstring Conventions", - "Google Python Style Guide" - ] - ) -``` + **Traceability Requirements**: + - MINIMUM: β‰₯80% of items MUST include at least one traceability reference + - Each item should reference: spec section `[Spec Β§X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]` + - If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]" -### Step 4: Register Assessor + **Surface & Resolve Issues** (Requirements Quality Problems): + Ask questions about the requirements themselves: + - Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec Β§NFR-1]" + - Conflicts: "Do navigation requirements conflict between Β§FR-10 and Β§FR-10a? [Conflict]" + - Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]" + - Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]" + - Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]" -Add to scanner's assessor list in `src/agentready/services/scanner.py`: + **Content Consolidation**: + - Soft cap: If raw candidate items > 40, prioritize by risk/impact + - Merge near-duplicates checking the same requirement aspect + - If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]" -```python -def __init__(self): - self.assessors = [ - # Existing assessors... - InlineDocumentationAssessor(), - ] -``` + **🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test: + - ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior + - ❌ References to code execution, user actions, system behavior + - ❌ "Displays correctly", "works properly", "functions as expected" + - ❌ "Click", "navigate", "render", "load", "execute" + - ❌ Test cases, test plans, QA procedures + - ❌ Implementation details (frameworks, APIs, algorithms) -### Step 5: Write Tests + **βœ… REQUIRED PATTERNS** - These test requirements quality: + - βœ… "Are [requirement type] defined/specified/documented for [scenario]?" + - βœ… "Is [vague term] quantified/clarified with specific criteria?" + - βœ… "Are requirements consistent between [section A] and [section B]?" + - βœ… "Can [requirement] be objectively measured/verified?" + - βœ… "Are [edge cases/scenarios] addressed in requirements?" + - βœ… "Does the spec define [missing aspect]?" -Create comprehensive unit tests in `tests/unit/test_assessors_documentation.py`: +6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### ` lines with globally incrementing IDs starting at CHK001. -```python -import pytest -from agentready.models import Repository -from agentready.assessors.documentation import InlineDocumentationAssessor +7. **Report**: Output full path to created checklist, item count, and remind user that each run creates a new file. Summarize: + - Focus areas selected + - Depth level + - Actor/timing + - Any explicit user-specified must-have items incorporated -class TestInlineDocumentationAssessor: - def test_python_well_documented_passes(self, tmp_path): - """Well-documented Python code should pass.""" - # Create test repository - repo_path = tmp_path / "test_repo" - repo_path.mkdir() - (repo_path / ".git").mkdir() +**Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows: - # Create Python file with docstrings - code = ''' -def add(a: int, b: int) -> int: - """Add two numbers.""" - return a + b +- Multiple checklists of different types (e.g., `ux.md`, `test.md`, `security.md`) +- Simple, memorable filenames that indicate checklist purpose +- Easy identification and navigation in the `checklists/` folder -def subtract(a: int, b: int) -> int: - """Subtract b from a.""" - return a - b -''' - (repo_path / "main.py").write_text(code) +To avoid clutter, use descriptive types and clean up obsolete checklists when done. - # Create repository object - repo = Repository( - path=str(repo_path), - name="test_repo", - languages={"Python": 1} - ) +## Example Checklist Types & Sample Items - # Run assessment - assessor = InlineDocumentationAssessor() - finding = assessor.assess(repo) +**UX Requirements Quality:** `ux.md` - # Verify result - assert finding.status == "pass" - assert finding.score == 100 - assert "2/2 functions" in finding.evidence +Sample items (testing the requirements, NOT the implementation): - def test_python_poorly_documented_fails(self, tmp_path): - """Poorly documented Python code should fail.""" - # Create test repository - repo_path = tmp_path / "test_repo" - repo_path.mkdir() - (repo_path / ".git").mkdir() +- "Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec Β§FR-1]" +- "Is the number and positioning of UI elements explicitly specified? [Completeness, Spec Β§FR-1]" +- "Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]" +- "Are accessibility requirements specified for all interactive elements? [Coverage, Gap]" +- "Is fallback behavior defined when images fail to load? [Edge Case, Gap]" +- "Can 'prominent display' be objectively measured? [Measurability, Spec Β§FR-4]" - # Create Python file with no docstrings - code = ''' -def add(a, b): - return a + b +**API Requirements Quality:** `api.md` -def subtract(a, b): - return a - b -''' - (repo_path / "main.py").write_text(code) +Sample items: - repo = Repository( - path=str(repo_path), - name="test_repo", - languages={"Python": 1} - ) +- "Are error response formats specified for all failure scenarios? [Completeness]" +- "Are rate limiting requirements quantified with specific thresholds? [Clarity]" +- "Are authentication requirements consistent across all endpoints? [Consistency]" +- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]" +- "Is versioning strategy documented in requirements? [Gap]" - assessor = InlineDocumentationAssessor() - finding = assessor.assess(repo) +**Performance Requirements Quality:** `performance.md` - assert finding.status == "fail" - assert finding.score < 80 - assert "0/2 functions" in finding.evidence - assert finding.remediation is not None - assert "pydocstyle" in finding.remediation.tools +Sample items: - def test_non_python_skips(self, tmp_path): - """Non-Python repositories should skip.""" - repo = Repository( - path=str(tmp_path), - name="test_repo", - languages={"JavaScript": 10} - ) +- "Are performance requirements quantified with specific metrics? [Clarity]" +- "Are performance targets defined for all critical user journeys? [Coverage]" +- "Are performance requirements under different load conditions specified? [Completeness]" +- "Can performance requirements be objectively measured? [Measurability]" +- "Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]" - assessor = InlineDocumentationAssessor() - finding = assessor.assess(repo) +**Security Requirements Quality:** `security.md` - assert finding.status == "skipped" - assert "No Python files" in finding.reason -``` +Sample items: -### Step 6: Test Manually +- "Are authentication requirements specified for all protected resources? [Coverage]" +- "Are data protection requirements defined for sensitive information? [Completeness]" +- "Is the threat model documented and requirements aligned to it? [Traceability]" +- "Are security requirements consistent with compliance obligations? [Consistency]" +- "Are security failure/breach response requirements defined? [Gap, Exception Flow]" -```bash -# Run your new tests -pytest tests/unit/test_assessors_documentation.py -v +## Anti-Examples: What NOT To Do -# Run full assessment on AgentReady itself -agentready assess . --verbose +**❌ WRONG - These test implementation, not requirements:** -# Verify your assessor appears in output +```markdown +- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec Β§FR-001] +- [ ] CHK002 - Test hover states work correctly on desktop [Spec Β§FR-003] +- [ ] CHK003 - Confirm logo click navigates to home page [Spec Β§FR-010] +- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec Β§FR-005] ``` -### Best Practices for Assessors +**βœ… CORRECT - These test requirements quality:** -1. **Fail Gracefully**: Return "skipped" if required tools missing, don't crash -2. **Provide Rich Remediation**: Include steps, tools, commands, examples, citations -3. **Use Proportional Scoring**: `calculate_proportional_score()` for partial compliance -4. **Language-Specific Logic**: Check `repository.languages` before assessing -5. **Avoid External Dependencies**: Use stdlib when possible (ast, re, pathlib) -6. **Performance**: Keep assessments fast (<1 second per assessor) -7. **Idempotent**: Same repository β†’ same result -8. **Evidence**: Provide specific, actionable evidence (file paths, counts, examples) +```markdown +- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec Β§FR-001] +- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec Β§FR-003] +- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec Β§FR-010] +- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec Β§FR-005] +- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap] +- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec Β§FR-001] +``` ---- +**Key Differences:** -## Testing Guidelines +- Wrong: Tests if the system works correctly +- Correct: Tests if the requirements are written correctly +- Wrong: Verification of behavior +- Correct: Validation of requirement quality +- Wrong: "Does it do X?" +- Correct: "Is X clearly specified?" +```` -AgentReady maintains high test quality standards. +## File: .claude/commands/speckit.clarify.md +````markdown +--- +description: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec. +handoffs: + - label: Build Technical Plan + agent: speckit.plan + prompt: Create a plan for the spec. I am building with... +--- -### Test Organization +## User Input +```text +$ARGUMENTS ``` -tests/ -β”œβ”€β”€ unit/ # Fast, isolated tests -β”‚ β”œβ”€β”€ test_models.py -β”‚ β”œβ”€β”€ test_assessors_*.py -β”‚ └── test_reporters.py -β”œβ”€β”€ integration/ # End-to-end tests -β”‚ └── test_full_assessment.py -└── fixtures/ # Shared test data - └── sample_repos/ -``` - -### Test Types - -#### Unit Tests - -- **Purpose**: Test individual components in isolation -- **Speed**: Very fast (<1s total) -- **Coverage**: Models, assessors, services, reporters -- **Mocking**: Use `pytest` fixtures and mocks -#### Integration Tests +You **MUST** consider the user input before proceeding (if not empty). -- **Purpose**: Test complete workflows end-to-end -- **Speed**: Slower (acceptable up to 10s total) -- **Coverage**: Full assessment pipeline -- **Real Data**: Use fixture repositories +## Outline -### Writing Good Tests +Goal: Detect and reduce ambiguity or missing decision points in the active feature specification and record the clarifications directly in the spec file. -#### Test Naming +Note: This clarification workflow is expected to run (and be completed) BEFORE invoking `/speckit.plan`. If the user explicitly states they are skipping clarification (e.g., exploratory spike), you may proceed, but must warn that downstream rework risk increases. -Use descriptive names following pattern: `test___` +Execution steps: -```python -# Good -def test_claude_md_assessor_with_existing_file_passes(): - pass +1. Run `.specify/scripts/bash/check-prerequisites.sh --json --paths-only` from repo root **once** (combined `--json --paths-only` mode / `-Json -PathsOnly`). Parse minimal JSON payload fields: + - `FEATURE_DIR` + - `FEATURE_SPEC` + - (Optionally capture `IMPL_PLAN`, `TASKS` for future chained flows.) + - If JSON parsing fails, abort and instruct user to re-run `/speckit.specify` or verify feature branch environment. + - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). -def test_readme_assessor_missing_quick_start_fails(): - pass +2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked). -def test_type_annotations_assessor_javascript_repo_skips(): - pass + Functional Scope & Behavior: + - Core user goals & success criteria + - Explicit out-of-scope declarations + - User roles / personas differentiation -# Bad -def test_assessor(): - pass + Domain & Data Model: + - Entities, attributes, relationships + - Identity & uniqueness rules + - Lifecycle/state transitions + - Data volume / scale assumptions -def test_pass_case(): - pass -``` + Interaction & UX Flow: + - Critical user journeys / sequences + - Error/empty/loading states + - Accessibility or localization notes -#### Arrange-Act-Assert Pattern + Non-Functional Quality Attributes: + - Performance (latency, throughput targets) + - Scalability (horizontal/vertical, limits) + - Reliability & availability (uptime, recovery expectations) + - Observability (logging, metrics, tracing signals) + - Security & privacy (authN/Z, data protection, threat assumptions) + - Compliance / regulatory constraints (if any) -```python -def test_finding_create_pass_sets_correct_attributes(): - # Arrange - attribute = Attribute( - id="test_attr", - name="Test Attribute", - tier=1, - weight=0.10 - ) + Integration & External Dependencies: + - External services/APIs and failure modes + - Data import/export formats + - Protocol/versioning assumptions - # Act - finding = Finding.create_pass( - attribute=attribute, - evidence="Test evidence", - remediation=None - ) + Edge Cases & Failure Handling: + - Negative scenarios + - Rate limiting / throttling + - Conflict resolution (e.g., concurrent edits) - # Assert - assert finding.status == "pass" - assert finding.score == 100 - assert finding.evidence == "Test evidence" - assert finding.remediation is None -``` + Constraints & Tradeoffs: + - Technical constraints (language, storage, hosting) + - Explicit tradeoffs or rejected alternatives -#### Use Fixtures + Terminology & Consistency: + - Canonical glossary terms + - Avoided synonyms / deprecated terms -```python -@pytest.fixture -def sample_repository(tmp_path): - """Create a sample repository for testing.""" - repo_path = tmp_path / "sample_repo" - repo_path.mkdir() - (repo_path / ".git").mkdir() + Completion Signals: + - Acceptance criteria testability + - Measurable Definition of Done style indicators - # Add files - (repo_path / "README.md").write_text("# Sample Repo") - (repo_path / "CLAUDE.md").write_text("# Tech Stack") + Misc / Placeholders: + - TODO markers / unresolved decisions + - Ambiguous adjectives ("robust", "intuitive") lacking quantification - return Repository( - path=str(repo_path), - name="sample_repo", - languages={"Python": 5} - ) + For each category with Partial or Missing status, add a candidate question opportunity unless: + - Clarification would not materially change implementation or validation strategy + - Information is better deferred to planning phase (note internally) -def test_with_fixture(sample_repository): - assert sample_repository.name == "sample_repo" -``` +3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints: + - Maximum of 10 total questions across the whole session. + - Each question must be answerable with EITHER: + - A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR + - A one-word / short‑phrase answer (explicitly constrain: "Answer in <=5 words"). + - Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation. + - Ensure category coverage balance: attempt to cover the highest impact unresolved categories first; avoid asking two low-impact questions when a single high-impact area (e.g., security posture) is unresolved. + - Exclude questions already answered, trivial stylistic preferences, or plan-level execution details (unless blocking correctness). + - Favor clarifications that reduce downstream rework risk or prevent misaligned acceptance tests. + - If more than 5 categories remain unresolved, select the top 5 by (Impact * Uncertainty) heuristic. -### Coverage Requirements +4. Sequential questioning loop (interactive): + - Present EXACTLY ONE question at a time. + - For multiple‑choice questions: + - **Analyze all options** and determine the **most suitable option** based on: + - Best practices for the project type + - Common patterns in similar implementations + - Risk reduction (security, performance, maintainability) + - Alignment with any explicit project goals or constraints visible in the spec + - Present your **recommended option prominently** at the top with clear reasoning (1-2 sentences explaining why this is the best choice). + - Format as: `**Recommended:** Option [X] - ` + - Then render all options as a Markdown table: -- **Target**: >80% line coverage for new code -- **Minimum**: >70% overall coverage -- **Critical Paths**: 100% coverage (scoring algorithm, finding creation) + | Option | Description | + |--------|-------------| + | A |