Fix: --include-pattern incorrectly filters directories leading to empty output #278
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem Description:
It has been observed that when users utilize the
--include-pattern(or-i) argument to specify file types for inclusion, for example,gitingest . -i "*.py, *.js", the resulting text digest is empty even if the target directory genuinely contains matching.pyor.jsfiles.Root Cause Analysis:
Upon investigation, the root cause was identified in the
_process_nodefunction withinsrc/gitingest/ingestion.py. The filtering logic forquery.include_patternswas erroneously applied to allsub_pathtypes, including directories.The
_should_includefunction (located insrc/gitingest/utils/ingestion_utils.py) is designed to match based on file extensions or file name patterns (e.g.,*.py). However, directories themselves (e.g.,src/,my_project/) cannot match these file-specific patterns.Consequently, when
_process_nodetraverses the file system, it prematurely skips any subdirectories that do not match aninclude_pattern. This prevents the traversal from descending into the directory tree, leading to the omission of all files that should have been included in the digest, even if those files (like.pyor.jsfiles) reside within these skipped directories.Proposed Solution:
To address this issue, modifications have been made to the
_process_nodefunction insrc/gitingest/ingestion.py. The_should_includelogic will now only be applied to file types (sub_path.is_file()).For directories (
sub_path.is_dir()), as long as they are not explicitly excluded byquery.ignore_patterns(which already incorporates default ignore rules, user-defined-epatterns, and removed overlaps with-i), they will continue to be recursively traversed. This ensures that even if a directory itself does not match a file pattern, it will still be properly navigated to discover files within it that do match theinclude_patternscriteria.Summary of Code Changes:
The call to
_should_includein_process_nodehas been changed from:to:
Expected Outcome:
This fix ensures that the
--include-patternargument functions as intended, effectively filtering file content without impeding directory traversal. Users can now correctly specify desired file types for inclusion and receive accurate output.Thank you for your review and support!