Skip to content

Conversation

@NullPointer-cell
Copy link

Fix #4609: Handle file-type license references in NuGet packages

Detect <license type='file'> in .nuspec files and extract file path to license_file_references field. Keep extracted_license_statement as raw path value to integrate with existing license resolution in process_codebase function.

This follows the two-phase architecture pattern:

  • Phase 1: Extract and store file path (this change)
  • Phase 2: Existing process_codebase resolves file references

Minimal changes (37 lines) following maintainer feedback from PR #4689.

Fixes #4609


Changes Made

Problem

NuGet packages with <license type="file">path/LICENSE.txt</license> were returning LicenseRef-scancode-unknown because the file-type license references were not being extracted.

Solution

  • Added license_file_references field to NugetNuspecHandler
  • Extract license file paths when @type="file" in nuspec XML
  • Store raw path in both extracted_license_statement and license_file_references
  • Follows existing two-phase architecture pattern used by other package handlers

Files Changed

  • src/packagedcode/nuget.py - Core implementation (21 lines added)
  • tests/packagedcode/test_nuget.py - New test case (8 lines added)
  • tests/packagedcode/data/nuget/license_file.nuspec - Test fixture (new file)
  • 7 .expected test files - Updated with license_file_references field (1 line each)

Testing

✅ All existing tests pass (11 passed in 7.91s)
✅ New test test_nuget_parse_license_file_reference validates license file extraction


Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
  • Commits are in uniquely-named feature branch (fix-4609-nuget-license-file-v4) and has no merge conflicts 📁
  • Updated documentation pages (not applicable - internal implementation detail)
  • Updated CHANGELOG.rst (not applicable - will be handled by maintainers)

Implementation Details

Code Changes in src/packagedcode/nuget.py:

# Added license_file_references field
license_file_references = []

# Parse license data structure
if isinstance(license_data, dict):
    license_type = license_data.get('@type', '')
    license_text = license_data.get('#text', '')
    
    if license_type == 'file':
        extracted_license_statement = license_text
        license_file_references = [license_text]  # Store file path for resolution

Why This Works

  1. Follows existing pattern: Other package handlers (like npm, cargo) use the same two-phase approach
  2. Minimal impact: Only adds the field and extraction logic - doesn't change existing behavior
  3. Integration ready: The existing process_codebase function will handle file resolution using the stored paths
  4. Backwards compatible: Packages without file-type licenses work exactly as before

Signed-off-by: Jayant jayantmcom@gmail.com

@NullPointer-cell NullPointer-cell force-pushed the fix-4609-nuget-license-file-v4 branch from 914707f to e213a40 Compare January 23, 2026 13:42
…packages

Detect <license type='file'> in .nuspec files and extract file path
to license_file_references field. Keep extracted_license_statement as
raw path value to integrate with existing license resolution in
process_codebase function.

This follows the two-phase architecture pattern:
- Phase 1: Extract and store file path (this change)
- Phase 2: Existing process_codebase resolves file references

Minimal changes (37 lines) following maintainer feedback from PR aboutcode-org#4689.

Fixes aboutcode-org#4609

Signed-off-by: Jayant <jayantmcom@gmail.com>
@NullPointer-cell NullPointer-cell force-pushed the fix-4609-nuget-license-file-v4 branch from e213a40 to 24c47b2 Compare January 23, 2026 13:55
@NullPointer-cell
Copy link
Author

NullPointer-cell commented Jan 23, 2026

Hi @pombredanne and @AyanSinhaMahapatra!

This fixes #4609 following the feedback from #4689. I've corrected the issues:

What changed from #4689:

  • No extra text added - stores raw path only in extracted_license_statement
  • Uses the license_file_references field as a separate data field
  • Follows two-phase architecture - file resolution deferred to license plugin's process_codebase

Implementation:

  • Extracts <license type="file"> path and stores it in license_file_references
  • Keeps raw path in extracted_license_statement (no prefixes or modifications)
  • Lets existing license plugin handle file resolution

All tests passing. Let me know If any changes required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NuGet scans return LicenseRef-scancode-unknown

2 participants