Skip to content

Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID#38

Open
caufieldjh wants to merge 5 commits intomainfrom
37-runtimeerror-couldnt-resolve-exlinksrv2-the-address-table-is-empty
Open

Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID#38
caufieldjh wants to merge 5 commits intomainfrom
37-runtimeerror-couldnt-resolve-exlinksrv2-the-address-table-is-empty

Conversation

@caufieldjh
Copy link
Contributor

No description provided.

@claude
Copy link
Contributor

claude bot commented Jan 31, 2026

Claude finished @caufieldjh's task —— View job


PR Review Complete ✅

PR #38: Fix for uncaught RuntimeError when retrieving some PMID entries without PMCID

Review Checklist:

  • Read and understand the changed files
  • Analyze code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security concerns
  • Assess test coverage
  • Provide constructive feedback

Summary

This PR addresses RuntimeError exceptions when fetching PMID entries that don't have associated PMCIDs. The changes add robust error handling and type safety to prevent crashes from malformed API responses.

Code Quality & Best Practices ✅

Excellent improvements:

  • Type safety: Added proper type annotations (dict[str, Any]) and runtime type checking
  • Defensive programming: Added isinstance() checks before accessing dictionary keys
  • Error handling: Wrapped risky operations with try/catch and proper logging
  • Consistent patterns: Applied the same defensive approach across both fetch() and _get_pmcid() methods

Lines of particular note:

  • src/linkml_reference_validator/etl/sources/pmid.py:121-127: Excellent addition of record type validation
  • src/linkml_reference_validator/etl/sources/pmid.py:300-307: Much more robust PMCID extraction logic

Bug Fixes ✅

Critical issues resolved:

  1. RuntimeError on missing PMCID: The original code would crash when result[0]["LinkSetDb"][0] didn't exist
  2. Type assumptions: Code now validates that record is actually a dict before accessing keys
  3. AttributeError prevention: Fixed CURIE converter to use .get() instead of attribute access

Performance Considerations ✅

No performance concerns:

  • Added type checks are minimal overhead
  • Early returns prevent unnecessary processing when data is malformed
  • No new API calls or blocking operations introduced

Security Concerns ✅

No security issues identified:

  • All external API interactions remain properly contained
  • No user input directly passed to external services
  • Error logging doesn't expose sensitive information

Test Coverage ✅

Well-tested changes:

  • Added specific test case test_get_pmcid_handles_entrez_error() in tests/test_sources.py:287-305
  • Test validates the exact error scenario (RuntimeError from Entrez.read)
  • Existing tests continue to pass with the defensive changes
  • Minor test fixes for integration tests show attention to detail

Specific Feedback

🟢 Excellent Additions

  1. Type narrowing in pmid.py:121-127:

    if not isinstance(record, dict):
        logger.warning(
            "Unexpected record format for PMID:%s: %s", pmid, type(record))
        return None

    This prevents crashes when the API returns unexpected data structures.

  2. Robust PMCID parsing in pmid.py:300-307:

    if isinstance(result, list) and result and isinstance(result[0], dict):
        link_set_db = result[0].get("LinkSetDb", [])
        if isinstance(link_set_db, list) and link_set_db:
            # ... more defensive checks

    Each level of nesting is validated before access.

  3. Error handling with cleanup in pmid.py:283-298:
    The finally block ensures resources are cleaned up even when exceptions occur.

🟡 Minor Suggestions

  1. Consider adding more specific exception types: Instead of catching all Exception, you could catch more specific types like urllib.error.URLError or xml.etree.ElementTree.ParseError if those are the expected failure modes.

  2. Test edge case coverage: Consider adding a test for the case where record is not a dict in the main fetch() method.

🟢 Code Follows Best Practices

  • Adheres to repository's "avoid try/except unless interfacing with external systems" guideline
  • Uses proper logging levels (WARNING for recoverable issues)
  • Maintains existing docstring format and examples
  • Follows type annotation conventions

Overall Assessment

This is a high-quality defensive programming improvement that addresses a real runtime issue. The changes are:

  • Focused and surgical - only touches problematic areas
  • Well-tested - includes regression test for the specific error
  • Type-safe - adds proper runtime validation
  • Maintainable - follows existing code patterns

Recommendation: Approve and merge. This PR significantly improves robustness without introducing any risks.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RuntimeError: Couldn't resolve #exLinkSrv2, the address table is empty

1 participant