Skip to content

Conversation

@shbhmexe
Copy link
Contributor

Summary

This PR fixes a crash when mapping unknown affiliations to email domains (-m) for emails whose domain has no dot (e.g. user@localhost). It also hardens date parsing in logparser.py to accept CommitDate: lines.

Background / Reasoning

  • database.MapToEmployer() iterates domain suffixes to find domain mappings, but for single-label domains the suffix loop never runs, leaving addr undefined. When unknown == 1, this leads to an exception instead of producing a domain-based fallback.
  • logparser.getDate() only skipped the Date: token; if the input contains CommitDate: (possible with git log --pretty=fuller), date parsing can fail.

Changes

  • src/database.py
    • Ensure addr is always defined before domain suffix iteration.
    • Make GetHackerDomain() treat single-label domains as a real label (e.g. Localhost *) instead of producing " *".
  • src/logparser.py
    • Ignore the CommitDate: token when extracting the date string.

Impact

  • No behavior change for normal dot-separated domains (e.g. gmail.com, example.co.uk).
  • Prevents a hard crash in -m mode for single-label domains.
  • Improves robustness when CommitDate: lines are present in the git log input.

Compatibility / Non-overlap

  • Based on current master (includes recent merges up through e5595aa1).
  • Changes are isolated and do not revert or duplicate previously merged fixes.

Verification

  • Change is intentionally minimal and guarded:
    • Existing behavior is unchanged for common inputs.
    • New behavior only applies to previously-crashing edge cases.
  • No new features added.

…sing

The `-m` (map unknowns to email domain) mode could crash when encountering
single-label domains (for example: user@localhost). The crash happens because
MapToEmployer() references `addr` even when the domain split loop never runs.

Fix this by:
- Keeping a safe default `addr` when the domain has no dots.
- Making GetHackerDomain() generate a meaningful label for single-label domains
  (so it won’t collapse to just " *").

Also make logparser's date parsing ignore the `CommitDate:` prefix, which can
appear in git logs produced with formats like `--pretty=fuller`, preventing
ValueError crashes during date range filtering.

Signed-off-by: shbhmexe shubhushukla586@gmail.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant