diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..b7b2950d --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,77 @@ + + +# Agent Guidelines for Mellea Contributors + +> **Which guide?** Modifying `mellea/`, `cli/`, or `test/` → this file. Writing code that imports Mellea → [`docs/AGENTS_TEMPLATE.md`](docs/AGENTS_TEMPLATE.md). + +## 1. Quick Reference +```bash +pre-commit install # Required: install git hooks +uv sync --all-extras --all-groups # Install all deps (required for tests) +ollama serve # Start Ollama (required for most tests) +uv run pytest -m "not qualitative" # Skips LLM quality tests (~2 min) +uv run pytest # Full suite (includes LLM quality tests) +uv run ruff format . && uv run ruff check . # Lint & format +``` +**Branches**: `feat/topic`, `fix/issue-id`, `docs/topic` + +## 2. Directory Structure +| Path | Contents | +|------|----------| +| `mellea/stdlib` | Core: Sessions, Genslots, Requirements, Sampling, Context | +| `mellea/backends` | Providers: HF, OpenAI, Ollama, Watsonx, LiteLLM | +| `mellea/helpers` | Utilities, logging, model ID tables | +| `cli/` | CLI commands (`m serve`, `m alora`, `m decompose`, `m eval`) | +| `test/` | All tests (run from repo root) | +| `scratchpad/` | Experiments (git-ignored) | + +## 3. Test Markers +- `@pytest.mark.qualitative` — LLM output quality tests (skipped in CI via `CICD=1`) +- **Unmarked** — Unit tests (may still require Ollama running locally) + +⚠️ Don't add `qualitative` to trivial tests—keep the fast loop fast. + +## 4. Coding Standards +- **Types required** on all core functions +- **Docstrings are prompts** — be specific, the LLM reads them +- **Google-style docstrings** +- **Ruff** for linting/formatting +- Use `...` in `@generative` function bodies +- Prefer primitives over classes + +## 5. Commits & Hooks +[Angular format](https://github.com/angular/angular/blob/main/CONTRIBUTING.md#commit): `feat:`, `fix:`, `docs:`, `test:`, `refactor:`, `release:` + +Pre-commit runs: ruff, mypy, uv-lock, codespell + +## 6. Timing +> **Don't cancel**: `pytest` (full) and `pre-commit --all-files` may take minutes. Canceling mid-run can corrupt state. + +## 7. Common Issues +| Problem | Fix | +|---------|-----| +| `ComponentParseError` | Add examples to docstring | +| `uv.lock` out of sync | Run `uv sync` | +| Ollama refused | Run `ollama serve` | + +## 8. Self-Review (before notifying user) +1. `uv run pytest -m "not qualitative"` passes? +2. `ruff format` and `ruff check` clean? +3. New functions typed with concise docstrings? +4. Unit tests added for new functionality? +5. Avoided over-engineering? + +## 9. Writing Tests +- Place tests in `test/` mirroring source structure +- Name files `test_*.py` (required for pydocstyle) +- Use `gh_run` fixture for CI-aware tests (see `test/conftest.py`) +- Mark tests checking LLM output quality with `@pytest.mark.qualitative` +- If a test fails, fix the **code**, not the test (unless the test was wrong) + +## 10. Feedback Loop +Found a bug, workaround, or pattern? Update the docs: +- **Issue/workaround?** → Add to Section 7 (Common Issues) in this file +- **Usage pattern?** → Add to [`docs/AGENTS_TEMPLATE.md`](docs/AGENTS_TEMPLATE.md) +- **New pitfall?** → Add warning near relevant section diff --git a/docs/AGENTS_TEMPLATE.md b/docs/AGENTS_TEMPLATE.md new file mode 100644 index 00000000..a1543c1e --- /dev/null +++ b/docs/AGENTS_TEMPLATE.md @@ -0,0 +1,183 @@ + + +# Mellea Usage Guidelines + +> **This file**: For code that *imports* Mellea. For Mellea internals, see [`../AGENTS.md`](../AGENTS.md). + +Copy below into your `AGENTS.md` or system prompt. + +--- + +### Library: Mellea +Use `mellea` for LLM interactions. No direct OpenAI/Anthropic calls or LangChain OutputParsers. + +**Prerequisites**: `pip install mellea` · [Docs](https://mellea.ai) · [Repo](https://github.com/generative-computing/mellea) + +#### 1. The `@generative` Pattern +**Don't** write prompt templates or regex parsers: +```python +# BAD - don't do this +response = openai.chat.completions.create(...) +age = int(re.search(r"\d+", response).group()) +``` +**Do** use typed function signatures: +```python +from mellea import generative, start_session + +@generative +def extract_age(text: str) -> int: + """Extract the user's age from text.""" + ... + +m = start_session() +age = extract_age(m, text="Alice is 30") # Returns int(30) +``` + +#### 2. Complex Types +```python +from pydantic import BaseModel +from mellea import generative + +class UserProfile(BaseModel): + name: str + age: int + interests: list[str] + +@generative +def parse_profile(bio: str) -> UserProfile: ... +``` + +#### 3. Chain-of-Thought +Add `reasoning` field to force the LLM to "think" before answering: +```python +from typing import Literal +from pydantic import BaseModel, Field + +class AnalysisResult(BaseModel): + reasoning: str # LLM fills first + conclusion: Literal["approve", "reject"] + confidence: float = Field(ge=0.0, le=1.0) + +@generative +def analyze_document(doc: str) -> AnalysisResult: ... +``` + +#### 4. Control Flow +Use Python `if/for/while`. No graph frameworks needed: +```python +if analyze_sentiment(m, email) == "negative": + draft = draft_apology(m, email) +else: + draft = draft_response(m, email) +``` + +#### 5. Instruct-Validate-Repair +For strict requirements, use `m.instruct()`: +```python +from mellea.stdlib.requirements import req, simple_validate +from mellea.stdlib.sampling import RejectionSamplingStrategy + +email = m.instruct( + "Write an invite for {{name}}", + requirements=[ + req("Must be formal"), + req("Lowercase only", validation_fn=simple_validate(lambda x: x.islower())) + ], + strategy=RejectionSamplingStrategy(loop_budget=3), + user_variables={"name": "Alice"} +) +``` + +#### 6. Small Model Fix +Small models (1B-8B) can't calculate. Extract params with LLM, compute in Python: +```python +from pydantic import BaseModel + +class PhysicsParams(BaseModel): + speed_a: float + speed_b: float + delay_hours: float + +@generative +def extract_params(text: str) -> PhysicsParams: + """EXTRACT numbers only. Do not calculate.""" + ... + +def calculate_gap(p: PhysicsParams) -> float: + return p.speed_a * p.delay_hours +``` + +#### 7. One-Shot Examples +If model struggles, add examples to docstring: +```python +@generative +def identify_fruit(text: str) -> str | None: + """ + Extract fruit from text, or None if none mentioned. + Ex: "I ate an apple" -> "apple" + Ex: "The sky is blue" -> None + """ + ... +``` + +#### 8. Backend Config +```python +from mellea import start_session +from mellea.backends.model_options import ModelOption + +m = start_session( + model_id="granite3.3:8b", + model_options={ModelOption.TEMPERATURE: 0.0, ModelOption.MAX_NEW_TOKENS: 500} +) +``` +Options: `TEMPERATURE`, `MAX_NEW_TOKENS`, `SYSTEM_PROMPT`, `SEED`, `TOOLS`, `CONTEXT_WINDOW`, `THINKING`, `STREAM` + +#### 9. Async +```python +@generative +async def extract_age(text: str) -> int: + """Extract age.""" + ... + +result = await extract_age(m, text="Alice is 30") +``` +Session methods: `ainstruct`, `achat`, `aact`, `avalidate`, `aquery`, `atransform` + +#### 10. Auth +- **Ollama**: `start_session()` (no setup) +- **OpenAI**: `export OPENAI_API_KEY="..."` +- **Watsonx**: `export WATSONX_API_KEY="..."`, `WATSONX_URL`, `WATSONX_PROJECT_ID` + +**Never hardcode API keys.** + +#### 11. Anti-Patterns +- **Don't** retry `@generative` calls — Mellea handles retries internally +- **Don't** use `json.loads()` — use typed returns +- **Don't** wrap single functions in classes +- **Do** use `try/except` at app boundaries for network errors + +#### 12. Debugging +```python +from mellea.core import FancyLogger +FancyLogger.get_logger().setLevel("DEBUG") +``` +- `m.last_prompt()` — see exact prompt sent + +#### 13. Common Errors +| Error | Fix | +|-------|-----| +| `ComponentParseError` | LLM output didn't match type—add docstring examples | +| `TypeError: missing positional argument` | First arg must be session `m` | +| `ConnectionRefusedError` | Run `ollama serve` | +| Output wrong/None | Model too small—try larger or add `reasoning` field | + +#### 14. Testing +```bash +uv run pytest -m "not qualitative" # Fast loop +uv run pytest # Full (verify prompts work) +``` + +#### 15. Feedback +Found a workaround or pattern? Add it to Section 13 (Common Errors) above, or update this file with new guidance.