-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: enhance parser domain-agnostic support #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Standardized capitalization of 'Git', 'GitHub', and 'URL' - Removed trailing slashes in links and added missing sentence periods in `README.md` - Adjusted docstrings to adhere to PEP 257 by using imperative tense - Standardized docstrings in `exceptions.py` - Replaced 'GitHub' with 'Git' when referring to broader context - Renamed templates: `github.jinja` → `git.jinja`, `github_form.jinja` → `git_form.jinja` - Renamed variables: `github_url` → `repo_url`
- Made `parse_query` in query_processor.py asynchronous - Made `main` in cli.py asynchronous - Made `ingest` in repository_ingest.py asynchronous - Updated test functions in test_query_parser.py to support async
- Renamed `_parse_url` to `_parse_repo_source` in query_parser.py - Adjusted docstrings to adhere to PEP 257 by using imperative tense
…update tests - Implemented function `_get_status_code` in repository_clone.py to extract the status code from an HTTP response - Adjusted `_check_repo_exists` in repository_clone.py to utilize the new `_get_status_code` function - Modified `_check_repo_exists` to return True for status codes 200 and 301, and False for 404 and 302 - Updated `test_check_repo_exists_with_redirect` in test_repository_clone.py to verify that `_check_repo_exists` returns False for status code 302 - Implemented test `test_check_repo_exists_with_permanent_redirect` in test_repository_clone.py to verify that `_check_repo_exists` returns True for status code 301
- added list of known domains/Git hosts in `query_parser.py` - fixed bug from [#115](#115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive - implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted - added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py` - extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo` - added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py` - created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
48c1695 to
cd1b14e
Compare
This was referenced Jan 10, 2025
cyclotruc
approved these changes
Jan 13, 2025
FOLKS-Tech
pushed a commit
to FOLKS-Tech/gitingest
that referenced
this pull request
Sep 5, 2025
* feat: make parser domain-agnostic to support multiple Git hosts - added list of known domains/Git hosts in `query_parser.py` - fixed bug from [coderamp-labs#115](coderamp-labs#115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive - implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted - added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py` - extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo` - added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py` - created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces improvements and refactoring across multiple modules. The key changes include making the URL parser domain-agnostic, refactoring HTTP response handling, renaming functions for clarity, converting core functions to asynchronous operations, and standardizing terminology and documentation.
Highlights:
Domain-Agnostic Parsing:
query_parser.pyto support multiple Git hosts by maintaining a list of known domains.try_domains_for_user_and_repoto iteratively guess the correct domain for a given user/repo._get_user_and_repo_from_path,_validate_host,_validate_scheme) to facilitate robust parsing._parse_repo_sourceto leverage the new domain-agnostic logic.test_query_parser.pyand a new test filetest_git_host_agnostic.pyto verify these changes.Enhanced Repository Existence Check:
_get_status_codeinrepository_clone.pyto extract HTTP response codes cleanly._check_repo_existsto utilize_get_status_code, refining its logic:Truefor status codes 200 and 301.Falsefor status codes 302 and 404.test_repository_clone.pyto cover redirect scenarios and ensure correctness.Function Renaming and Documentation:
_parse_urlto_parse_repo_sourceinquery_parser.pyfor clarity.Asynchronous Conversions:
parse_queryinquery_processor.py,mainincli.py, andingestinrepository_ingest.py) to asynchronous to support domain-agnostic parsing.test_query_parser.pyto support async execution.Terminology and Documentation Standardization:
README.md, fixed trailing slashes in links, and ensured punctuation consistency.github.jinja→git.jinja,github_form.jinja→git_form.jinja) and variables (github_url→repo_url) accordingly.Test Organization:
test_query_parser.pyto a more structured location undertests/query_parser/for better organization.These changes collectively improve the flexibility of the parser for multiple Git hosts and enhance code clarity and consistency.