Fix insecure regex in URL validation and normalize repository data #2529
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses two issues:
Security/Bug Fix: The regex used to parse GitHub URLs in
gfi/populate.py
was flawed. It used a character class [...] instead of a group
(...)
for the protocol, and lacked a start/end anchor. This allowed invalid URLs (e.g., notgithub.com) to pass validation.
Fix: Updated GH_URL_PATTERN to r"^(?:https?://)?github.com/(?P[\w.-]+)/(?P[\w.-]+)/?$"
Data Consistency: The
data/repositories.toml
file had an inconsistent entry with a protocol prefix.
Fix: Removed https:// from the entry for pyupio/safety to match the project convention.
Verification:
Validated the new regex against various test cases (valid and invalid URLs).
Verified that
gfi/test_data.py
passes with the changes.