From f7819341e79ae0922a52da888a5fffe25aa71244 Mon Sep 17 00:00:00 2001 From: Junaid Syed Date: Fri, 2 May 2025 22:59:34 -0700 Subject: [PATCH 1/4] Docs: Add examples for multiple patterns and API usage to README --- README.md | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/README.md b/README.md index b4d28ebf..80a4ec34 100644 --- a/README.md +++ b/README.md @@ -89,6 +89,23 @@ gitingest --help This will write the digest in a text file (default `digest.txt`) in your current working directory. +### Using Multiple Patterns (CLI) + +You can specify multiple patterns to include or exclude files and directories by repeating the respective flags (-e for exclude, -i for include). Patterns use standard shell wildcards. + +```bash +# Example 1: Exclude all log files, temporary files, and the entire 'dist' directory +gitingest /path/to/your/project -e "*.log" -e "*.tmp" -e "dist/" + +# Example 2: Include only Python files and Markdown files from the repository +gitingest https://github.com/user/repo -i "*.py" -i "*.md" + +# Example 3: Exclude test directories and specific config files from the current directory (.) +gitingest . -e "tests/" -e "**/config.dev.json" -e "node_modules/" +``` + +Remember that exclusion patterns take precedence. If specific include patterns are provided, only files matching those and not matching any exclude pattern will be processed. + ## 🐍 Python package usage ```python @@ -111,6 +128,60 @@ import asyncio result = asyncio.run(ingest_async("path/to/directory")) ``` +### Advanced Pattern Usage (Python API) + +The Python API allows for fine-grained control by combining include_patterns and exclude_patterns. When both are provided: + +Files/directories are first checked against all exclusion patterns (default built-in patterns + user-provided exclude_patterns). If a match occurs, the item is skipped. + +If include_patterns are specified, any remaining item must also match at least one of the include patterns to be kept. If include_patterns is None or empty, this step is skipped. + +```python +# Advanced Python API Example +from gitingest import ingest +import asyncio + +# Scenario: Ingest only files within the 'src/' directory of a specific branch, +# but explicitly exclude any '.log' files found within 'src/'. + +repo_source = "https://github.com/some/repo" +target_branch = "develop" +include_only_src = {"src/*"} # Glob pattern to match items directly inside src/ +exclude_src_logs = {"src/*.log"} # Pattern to exclude log files specifically within src/ + +# --- Synchronous Usage --- +print("Running synchronous ingest with patterns...") +summary_sync, tree_sync, content_sync = ingest( + repo_source, + branch=target_branch, + include_patterns=include_only_src, + exclude_patterns=exclude_src_logs + # Note: Default ignores like .git/, .venv/ are still active automatically +) + +print("\n--- Sync Summary ---") +print(summary_sync) +# The resulting tree and content will only contain non-log files +# directly within the 'src/' directory of the 'develop' branch. + +# --- Asynchronous Usage --- +async def run_async_ingest(): + print("\nRunning asynchronous ingest with patterns...") + summary_async, tree_async, content_async = await ingest_async( + repo_source, + branch=target_branch, + include_patterns=include_only_src, + exclude_patterns=exclude_src_logs + ) + print("\n--- Async Summary ---") + print(summary_async) + +# Run the async example +# asyncio.run(run_async_ingest()) +``` + +This setup ensures that even if src/ contains log files, they will be explicitly removed from the final digest, while only other files directly within src/ are included. + ### Jupyter notebook usage ```python From 555cb75e9d90e46bd1d667118ce1293465f1186e Mon Sep 17 00:00:00 2001 From: Junaid Syed Date: Fri, 2 May 2025 23:03:43 -0700 Subject: [PATCH 2/4] Docs: Add examples for multiple patterns and API usage to README --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 80a4ec34..6df53b9a 100644 --- a/README.md +++ b/README.md @@ -144,7 +144,7 @@ import asyncio # Scenario: Ingest only files within the 'src/' directory of a specific branch, # but explicitly exclude any '.log' files found within 'src/'. -repo_source = "https://github.com/some/repo" +repo_source = "https://github.com/some/repo" # or path/to/directory target_branch = "develop" include_only_src = {"src/*"} # Glob pattern to match items directly inside src/ exclude_src_logs = {"src/*.log"} # Pattern to exclude log files specifically within src/ From 242a6873c94a64481aefb35d00b1b4d255a9fd1a Mon Sep 17 00:00:00 2001 From: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com> Date: Thu, 26 Jun 2025 08:33:22 +0200 Subject: [PATCH 3/4] Apply suggestions from code review minor --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 54d7c14d..4b459f4d 100644 --- a/README.md +++ b/README.md @@ -125,7 +125,7 @@ gitingest --help ### Using Multiple Patterns (CLI) -You can specify multiple patterns to include or exclude files and directories by repeating the respective flags (-e for exclude, -i for include). Patterns use standard shell wildcards. +You can specify multiple patterns to include or exclude files and directories by repeating the respective flags (`-e` for exclude, `-i` for include). Patterns use standard shell wildcards. ```bash # Example 1: Exclude all log files, temporary files, and the entire 'dist' directory @@ -135,7 +135,7 @@ gitingest /path/to/your/project -e "*.log" -e "*.tmp" -e "dist/" gitingest https://github.com/user/repo -i "*.py" -i "*.md" # Example 3: Exclude test directories and specific config files from the current directory (.) -gitingest . -e "tests/" -e "**/config.dev.json" -e "node_modules/" +gitingest -e "tests/" -e "**/config.dev.json" -e "node_modules/" ``` Remember that exclusion patterns take precedence. If specific include patterns are provided, only files matching those and not matching any exclude pattern will be processed. @@ -179,11 +179,11 @@ result = asyncio.run(ingest_async("path/to/directory")) ### Advanced Pattern Usage (Python API) -The Python API allows for fine-grained control by combining include_patterns and exclude_patterns. When both are provided: +The Python API allows for fine-grained control by combining `include_patterns` and `exclude_patterns`. When both are provided: -Files/directories are first checked against all exclusion patterns (default built-in patterns + user-provided exclude_patterns). If a match occurs, the item is skipped. +Files/directories are first checked against all exclusion patterns (default built-in patterns + user-provided `exclude_patterns`). If a match occurs, the item is skipped. -If include_patterns are specified, any remaining item must also match at least one of the include patterns to be kept. If include_patterns is None or empty, this step is skipped. +If `include_patterns` are specified, any remaining item must also match at least one of the include patterns to be kept. If `include_patterns` is None or empty, this step is skipped. ```python # Advanced Python API Example @@ -229,7 +229,7 @@ async def run_async_ingest(): # asyncio.run(run_async_ingest()) ``` -This setup ensures that even if src/ contains log files, they will be explicitly removed from the final digest, while only other files directly within src/ are included. +This setup ensures that even if `src/` contains log files, they will be explicitly removed from the final digest, while only other files directly within `src/` are included. ### Jupyter notebook usage From db47eb18ca21bf7670d21a002c74ff581b33301f Mon Sep 17 00:00:00 2001 From: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com> Date: Thu, 26 Jun 2025 08:34:54 +0200 Subject: [PATCH 4/4] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4b459f4d..0ad88661 100644 --- a/README.md +++ b/README.md @@ -131,7 +131,7 @@ You can specify multiple patterns to include or exclude files and directories by # Example 1: Exclude all log files, temporary files, and the entire 'dist' directory gitingest /path/to/your/project -e "*.log" -e "*.tmp" -e "dist/" -# Example 2: Include only Python files and Markdown files from the repository +# Example 2: Include only Python and Markdown files from the repository gitingest https://github.com/user/repo -i "*.py" -i "*.md" # Example 3: Exclude test directories and specific config files from the current directory (.)