diff --git a/.github/workflows/copilot-setup-steps.yml b/.github/workflows/copilot-setup-steps.yml new file mode 100644 index 00000000..f0e11f84 --- /dev/null +++ b/.github/workflows/copilot-setup-steps.yml @@ -0,0 +1,36 @@ +name: Copilot Setup Steps + +on: + workflow_dispatch: + push: + paths: + - .github/workflows/copilot-setup-steps.yml + pull_request: + paths: + - .github/workflows/copilot-setup-steps.yml + +jobs: + copilot-setup-steps: + runs-on: ubuntu-latest + + permissions: + contents: read + + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.11" + cache: pip + + - name: Install uv + run: pip install uv + + - name: Install pre-commit and pre-commit-uv + run: pip install pre-commit pre-commit-uv + + - name: Install tox and tox-uv + run: pip install tox tox-uv diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..7e49651b --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,454 @@ +# AGENTS.md + +This file provides guidance for AI coding agents working on the **markdown-it-py** repository. + +## Project Overview + +markdown-it-py is a Python port of [markdown-it](https://github.com/markdown-it/markdown-it), the JavaScript Markdown parser. It provides: + +- A Markdown parser following the [CommonMark spec](https://commonmark.org/) +- Configurable syntax: you can add new rules and even replace existing ones +- Pluggable architecture with support for syntax extensions (see [mdit-py-plugins](https://github.com/executablebooks/mdit-py-plugins)) +- High performance with efficient parsing algorithms +- Safe by default with configurable HTML handling + +markdown-it-py is designed as a foundation for projects requiring robust Markdown parsing in Python, with the same design principles as the original JavaScript implementation. + +## Repository Structure + +``` +pyproject.toml # Project configuration and dependencies (flit) +tox.ini # Tox test environment configuration (use with tox-uv for faster env creation) + +markdown_it/ # Main source code +├── __init__.py # Package init +├── main.py # MarkdownIt main class +├── token.py # Token dataclass +├── ruler.py # Ruler class for managing rules +├── tree.py # SyntaxTreeNode for AST representation +├── renderer.py # RendererHTML and RendererProtocol +├── parser_core.py # ParserCore - top-level rules executor +├── parser_block.py # ParserBlock - block-level tokenizer +├── parser_inline.py # ParserInline - inline tokenizer +├── utils.py # Utility types (OptionsType, PresetType, etc.) +├── common/ # Common utilities +├── helpers/ # Helper functions +├── presets/ # Configuration presets (commonmark, gfm-like, zero, etc.) +├── rules_core/ # Core parsing rules +├── rules_block/ # Block-level parsing rules +├── rules_inline/ # Inline parsing rules +├── cli/ # Command-line interface +└── py.typed # PEP 561 marker + +tests/ # Test suite +├── test_api/ # API tests +├── test_cmark_spec/ # CommonMark spec compliance tests +├── test_port/ # Port-specific tests +├── test_tree/ # SyntaxTreeNode tests +├── fuzz/ # Fuzzing tests for OSS-Fuzz +├── test_cli.py # CLI tests +├── test_linkify.py # Linkify tests +└── test_tree.py # Tree tests + +docs/ # Documentation source +├── conf.py # Sphinx configuration +├── index.md # Documentation index +├── architecture.md # Design principles +├── using.md # Usage guide +├── plugins.md # Plugin documentation +├── contributing.md # Contributing guide +├── performance.md # Performance benchmarks +└── security.md # Security considerations + +benchmarking/ # Performance benchmarking +scripts/ # Utility scripts +``` + +## Development Commands + +All commands should be run via [`tox`](https://tox.wiki) for consistency. The project uses `tox-uv` for faster environment creation. + +### Testing + +```bash +# Run all tests +tox + +# Run tests with specific Python version +tox -e py311 + +# Run tests with plugins +tox -e py311-plugins + +# Run a specific test file +tox -- tests/test_api/test_main.py + +# Run a specific test function +tox -- tests/test_api/test_main.py::test_get_rules + +# Run tests with coverage +tox -- --cov=markdown_it --cov-report=html +``` + +### Documentation + +```bash +# Build docs (clean) +tox -e docs-clean + +# Build docs (incremental) +tox -e docs-update + +# Specific builder (e.g., linkcheck) +BUILDER=linkcheck tox -e docs-update +``` + +### Benchmarking and Profiling + +```bash +# Run core benchmarks +tox -e py311-bench-core + +# Run package comparison benchmarks +tox -e py311-bench-packages + +# Run profiler +tox -e profile +``` + +### Fuzzing + +```bash +# Run fuzzer on testcase file +tox -e fuzz path/to/testcase +``` + +### Code Quality + +```bash +# Run pre-commit hooks on all files +pre-commit run --all-files + +# Type checking (via pre-commit) +pre-commit run mypy --all-files + +# Linting and formatting (via pre-commit) +pre-commit run ruff --all-files +pre-commit run ruff-format --all-files +``` + +## Code Style Guidelines + +- **Formatter/Linter**: Ruff (configured in `pyproject.toml`) +- **Type Checking**: Mypy with strict settings (configured in `pyproject.toml`) +- **Pre-commit**: Use pre-commit hooks for consistent code style (`.pre-commit-config.yaml`) + +### Best Practices + +- **Type annotations**: Use complete type annotations for all function signatures. The codebase uses strict mypy settings. +- **Docstrings**: Use Google-style or Sphinx-style docstrings. Types are not required in docstrings as they should be in type hints. +- **Pure functions**: Where possible, write pure functions without side effects. +- **Immutability**: Prefer immutable data structures. The `Token` class uses dataclass with appropriate mutability. +- **Testing**: Write tests for all new functionality. Use `pytest-regressions` for output comparison tests. + +### Type Annotation Example + +```python +from __future__ import annotations + +from typing import Sequence + +def parse_blocks( + state: StateBlock, + start_line: int, + end_line: int, + silent: bool = False +) -> bool: + """Parse block-level content. + + :param state: The parser state object + :param start_line: Starting line number + :param end_line: Ending line number + :param silent: If True, only validate without generating tokens + :return: True if parsing succeeded + """ + ... +``` + +## Architecture Overview + +### Parsing Pipeline + +markdown-it-py follows a multi-stage parsing pipeline: + +``` +Markdown → Tokens → HTML +``` + +The parsing happens through three nested chains: + +1. **Core Chain** (`parser_core.py`): Top-level rules that orchestrate the parsing +2. **Block Chain** (`parser_block.py`): Parse block-level content (headings, lists, code blocks, etc.) +3. **Inline Chain** (`parser_inline.py`): Parse inline content (emphasis, links, code spans, etc.) + +### Token Stream + +Instead of a traditional AST, markdown-it-py uses a **token stream** representation: + +- Tokens are a simple sequence (list) +- Opening and closing tags are separate tokens +- Inline containers have nested tokens in their `.children` property +- This design follows the KISS principle and allows easy manipulation + +### Key Components + +#### MarkdownIt Class (`main.py`) + +The main entry point for parsing: + +- `parse()`: Parse markdown and return token stream +- `render()`: Parse and render to HTML +- `use()`: Add plugins +- `enable()` / `disable()`: Control rules +- `set()`: Set options + +#### Ruler Class (`ruler.py`) + +Manages parsing rules: + +- Rules can be enabled/disabled by name +- Rules can be inserted at specific positions +- Each parser (core/block/inline) has its own Ruler instance + +#### Token Class (`token.py`) + +Represents a single token in the stream: + +- `type`: Token type (e.g., "paragraph_open", "text", "heading_close") +- `tag`: HTML tag to use for rendering +- `attrs`: Attributes for the HTML tag +- `content`: Raw content +- `children`: Nested tokens for inline containers +- `level`: Nesting level + +#### Renderer (`renderer.py`) + +Converts token stream to HTML: + +- `render()`: Convert full token stream to HTML +- `renderToken()`: Render a single token +- Custom render rules can be added via `add_render_rule()` + +### Data Flow + +``` +Input Markdown + ↓ +Core Rules (normalize, etc.) + ↓ +Block Parser → Block Tokens + ↓ +Core Rules (intermediate) + ↓ +Inline Parser → Inline Tokens (for each block token with "inline" type) + ↓ +Core Rules (final: abbreviations, footnotes, linkify, etc.) + ↓ +Token Stream + ↓ +Renderer + ↓ +HTML Output +``` + +## Testing Guidelines + +### Test Structure + +- Tests use `pytest` with fixtures from `conftest.py` files +- CommonMark spec tests are in `tests/test_cmark_spec/` +- Port-specific tests verify JavaScript markdown-it parity +- Regression testing uses `pytest-regressions` for output comparison +- Fuzzing tests are in `tests/fuzz/` for integration with OSS-Fuzz + +### Writing Tests + +1. For API tests, add to appropriate file in `tests/test_api/` +2. For new syntax/rules, add test cases to `tests/test_port/` +3. For CommonMark compliance, run the spec test updater +4. Use `file_regression` fixture for comparing output against stored fixtures +5. Use parameterization for multiple test scenarios + +### Test Best Practices + +- **Test coverage**: Write tests for all new functionality and bug fixes +- **Isolation**: Each test should be independent +- **Descriptive names**: Test function names should describe what is being tested +- **Regression testing**: Use `file_regression.check()` for complex output comparisons +- **Parametrization**: Use `@pytest.mark.parametrize` for multiple test scenarios + +### Example Test Pattern + +```python +import pytest +from markdown_it import MarkdownIt + +def test_basic_parsing(): + md = MarkdownIt() + result = md.render("# Heading\n\nParagraph") + assert "
Paragraph
" in result + +@pytest.mark.parametrize( + "input_text,expected", + [ + ("**bold**", "bold"), + ("*italic*", "italic"), + ] +) +def test_emphasis(input_text, expected): + md = MarkdownIt() + result = md.render(input_text) + assert expected in result +``` + +## Commit Message Format + +Use this format: + +``` +