Skip to content

Conversation

@filipchristiansen
Copy link
Contributor

Previously, atomics/**/*.md and similar patterns never matched, because _should_include treated all patterns as fixed-depth globs.

This patch adds dedicated logic that distinguishes recursive (**) from non-recursive patterns and introduces directory-aware checks so we only descend into directories that could contain matches.

Fixes: #275

⚠️ Known issue: the matcher now collects the correct .md files, but the directory-structure output still lists the directories themselves even when they contain no matched files (e.g. atomics/Indexes/Attack-Navigator-Layers/, atomics/T1003.003/src/). A follow-up PR will tackle this.

Previously, `atomics/**/*.md` and similar patterns never matched, because `_should_include`
treated all patterns as fixed-depth globs.

This patch adds dedicated logic that distinguishes recursive (`**`) from non-recursive
patterns and introduces directory-aware checks so we only descend into directories that
*could* contain matches.

✅ Fixes #275 (pattern matching)

⚠️ Known issue the matcher now collects the correct `.md` files, but the directory-structure
output still lists the directories themselves even when they contain no matched files
(e.g. `atomics/Indexes/Attack-Navigator-Layers/`, `atomics/T1003.003/src/`).
A follow-up PR will tackle this.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the glob-matching logic in _should_include to properly handle recursive (**) patterns and avoid descending into directories that cannot match, while also updating development documentation and bumping hook versions.

  • Distinguishes between recursive and non-recursive glob patterns and adjusts directory traversal accordingly
  • Adds a ValueError for non-recursive patterns when directory depth is exceeded
  • Updates CONTRIBUTING instructions with --reload and bumps pre-commit hook versions

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/gitingest/utils/ingestion_utils.py Enhanced _should_include to support ** recursion and strict depth checks; added error raising and updated docstring
CONTRIBUTING.md Added --reload flag to local server startup command
.pre-commit-config.yaml Upgraded pyupgrade, markdownlint, and pylint hook versions
Comments suppressed due to low confidence (2)

src/gitingest/utils/ingestion_utils.py:113

  • The TODO comment in _should_exclude indicates incomplete functionality. Either implement the intended behavior for ** in exclude patterns or open an issue to track this work and remove the placeholder.
TODO: Check if we need to handle exclude patterns with **, and if so, how.

src/gitingest/utils/ingestion_utils.py:69

  • Consider adding unit tests to cover edge cases of both recursive (**) and non-recursive glob patterns, including directory-depth mismatches and patterns ending in /*, to ensure this new logic behaves as expected.
for pattern in include_patterns - {""}:  # ignore empty pattern

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants