Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[flake8]
max-line-length = 120
max-complexity = 12
select = E,F,W,C90
extend-ignore = F403,F405
exclude =
.git,
__pycache__,
venv,
build,
dist,
sdiff.egg-info
37 changes: 37 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: CI

on:
workflow_dispatch:
pull_request:
types: [opened, synchronize, reopened, ready_for_review]
push:
branches: [master]

jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: "pip"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[tests]

- name: Format check
run: python -m autopep8 --exit-code --diff --max-line-length 120 -r sdiff tests

- name: Lint
run: python -m flake8 --config .flake8 sdiff tests

- name: Test
run: python -m coverage run -m pytest -s --durations=3 --durations-min=0.005

- name: Coverage report
run: python -m coverage report -m
5 changes: 5 additions & 0 deletions .husky/pre-commit
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env sh
. "$(dirname -- "$0")/_/husky.sh"

python -m autopep8 --exit-code --diff --max-line-length 120 -r sdiff tests
python -m flake8 --config .flake8 sdiff tests
11 changes: 0 additions & 11 deletions .travis.yml

This file was deleted.

32 changes: 32 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Repository Guidelines

## Project Structure & Module Organization
The core library lives in `sdiff/` (parser, comparer, renderer, and models). Tests are in `tests/`, with shared fixtures in `tests/fixtures/`. Reference PDFs sit in `docs/`. Packaging and tooling are defined in `setup.py`, `setup.cfg`, and the `Makefile`; `CHANGELOG` tracks releases.

## Build, Test, and Development Commands
- `make env` creates the local `venv/` (Python 3.11+).
- `make dev` installs the package plus test/dev extras (`.[tests,devtools]`) into the venv.
- `make test` runs linting and the full pytest suite with coverage.
- `make vtest` runs pytest verbosely.
- `make flake` runs the autopep8 format check and flake8 on `sdiff/` and `tests/`.
- `make format` applies autopep8 formatting to `sdiff/` and `tests/`.
- `make cov` prints the coverage report.
- `make clean` removes build artifacts and the venv.
- `make hooks` installs Husky git hooks (requires Node/npm; `make dev` runs this).

Lint parity: CI and the Husky pre-commit hook both run the same checks as `make flake` (autopep8 check + flake8). Run `make flake` or `make test` locally to mirror CI.

Example flow:
```sh
make dev
make test
```

## Coding Style & Naming Conventions
Use standard Python conventions: 4-space indentation, `snake_case` for modules/functions/variables, and `PascalCase` for classes. Flake8 enforces a 120-character line limit (see `setup.cfg`). `autopep8` is available for formatting. Keep new modules in `sdiff/` and new tests in `tests/` with filenames like `test_<area>.py`.

## Testing Guidelines
The suite uses `pytest` with `coverage`. Coverage is expected to stay high (current config fails under 96%). Add or update tests for behavior changes, and prefer small, focused unit tests. Place reusable data in `tests/fixtures/`. Run `make test` before submitting changes.

## Commit & Pull Request Guidelines
Commit messages in this repo are short and often use a type prefix (e.g., `chore: ...`, `fixes: ...`, `hotfix: ...`, `refactors: ...`). Follow that pattern where practical, and keep the summary concise. For PRs, include a brief description, list tests run (e.g., `make test`), and link related issues or tickets when available.
15 changes: 14 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ env:

dev: env update
$(PIP) install .[tests,devtools]
@$(MAKE) hooks

install: env update

Expand All @@ -28,8 +29,20 @@ publish:
$(TWINE) upload --verbose --sign --username developer --repository-url http://$(PYPICLOUD_HOST)/simple/ dist/*.whl

flake:
$(PYTHON) -m autopep8 --exit-code --diff --max-line-length 120 -r sdiff tests
$(FLAKE) sdiff tests

format:
$(PYTHON) -m autopep8 --in-place --max-line-length 120 -r sdiff tests

hooks:
@if command -v npm >/dev/null 2>&1; then \
npm install --no-package-lock --silent; \
npm run --silent prepare; \
else \
echo "npm not found; skipping husky install"; \
fi

test: flake
$(COVERAGE) run -m pytest $(TEST_RUNNER_FLAGS)

Expand Down Expand Up @@ -57,4 +70,4 @@ clean:
rm -rf venv


.PHONY: all build env linux run pep test vtest testloop cov clean
.PHONY: all build env linux run pep test vtest testloop cov clean hooks format
40 changes: 39 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,40 @@
# md-sdiff
Diffs to markdown texts only based on their structure. Ignores content. Helpful to diff 2 files that contain the same content in different languages.

Structural diffs for Markdown. The library parses two Markdown inputs into a lightweight tree and compares the *shape* (headings, lists, paragraphs, links, etc.) instead of the text content. This is useful when you expect the same document structure across translations or when you want to validate formatting consistency without caring about the wording.

## What it does
- Parses Markdown into an AST-like node tree using `mistune`.
- Compares trees node-by-node and flags insertions/deletions in structure.
- Returns a rendered view of each document plus a list of structural errors.
- Supports a Zendesk-specific parser (`ZendeskHelpMdParser`) for `<callout>`, `<steps>`, and `<tabs>` blocks.

## Example usage
```python
from sdiff import diff, TextRenderer, MdParser

left = "# Title\n\n- One\n- Two"
right = "# Title\n\n- One\n- Two\n- Three"

rendered_left, rendered_right, errors = diff(left, right, renderer=TextRenderer(), parser_cls=MdParser)
print(errors[0]) # "There is a missing element `li`."
```

## Renderers
`TextRenderer` returns the original Markdown structure as text. `HtmlRenderer` wraps the output and marks structural insertions/deletions with `<ins>` and `<del>`.

## One-off usage
```sh
python - <<'PY'
from sdiff import diff, TextRenderer

left = open("left.md", "r", encoding="utf-8").read()
right = open("right.md", "r", encoding="utf-8").read()
_, _, errors = diff(left, right, renderer=TextRenderer())

for err in errors:
print(err)
PY
```

## Notes
This project is a library (no CLI). If you need different token handling, you can provide a custom parser class that extends `MdParser`.
10 changes: 10 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"name": "html-structure-diff",
"private": true,
"devDependencies": {
"husky": "^9.0.0"
},
"scripts": {
"prepare": "husky install"
}
}
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
mistune==0.8.1
mistune==3.2.0
14 changes: 11 additions & 3 deletions sdiff/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,21 @@


def diff(md1, md2, renderer=TextRenderer(), parser_cls: type[MdParser] = MdParser):
"""Compare two Markdown strings by structure and return rendered outputs + errors.

Args:
md1: Left Markdown string.
md2: Right Markdown string.
renderer: Renderer instance used to format the output (TextRenderer by default).
parser_cls: Parser class to use (MdParser by default).

Returns:
(rendered_left, rendered_right, errors)
"""
tree1 = parse(md1, parser_cls)
tree2 = parse(md2, parser_cls)

tree1, tree2, struct_errors = diff_struct(tree1, tree2)
# tree1, tree2, links_errors = diff_links(tree1, tree2)

# errors = struct_errors + links_errors
errors = struct_errors

return renderer.render(tree1), renderer.render(tree2), errors
2 changes: 2 additions & 0 deletions sdiff/compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,10 @@ def _diff(tree1, tree2, include_symbols=None, exclude_symbols=None):


def diff_links(tree1, tree2):
"""Diff only link-relevant structure (paragraphs/headers/lists/links)."""
return _diff(tree1, tree2, include_symbols=['p', 'h', 'l', 'a'])


def diff_struct(tree1, tree2):
"""Diff overall structure, ignoring link and image content."""
return _diff(tree1, tree2, exclude_symbols=['a', 'i'])
Loading
Loading