Skip to content

SARIF reimport with unique_id_from_tool_or_hash_code closes valid findings even though unique_id_from_tool matches #14205

@cpriyakant

Description

@cpriyakant

Description of the bug

When reimporting CodeQL SARIF into the same test with DD_DEDUPLICATION_ALGORITHM_PER_PARSER='{"SARIF":"unique_id_from_tool_or_hash_code"}', findings are closed even though the new SARIF contains the same vulnerability with an identical unique_id_from_tool. Only hash_code changes between scans, which is expected for a line/location change.

Given the metadata below, I expect DefectDojo to keep the existing finding open and reuse it (same vuln, new location), but instead some findings are being marked closed and no new findings are created.

Environment

  • DefectDojo edition: Community

  • DefectDojo version: (2.53.5)

  • Deployment: docker-compose

  • Relevant env var:

    DD_DEDUPLICATION_ALGORITHM_PER_PARSER: '{"SARIF": "unique_id_from_tool_or_hash_code"}'
  • Test type: SARIF

  • Endpoint used: /api/v2/reimport-scan/

  • close_old_findings on reimport: true

  • Same Product → same Engagement → same Test for both scans

What I’m doing

  1. Initial CodeQL SARIF upload to a SARIF test via reimport-scan (first time behaves like import).
  2. Later CodeQL SARIF reimport into the same test using /api/v2/reimport-scan/ with close_old_findings=true.
  3. Code has changed so the line/location changed, but the logical vuln is the same.

Observed behavior

  • Some findings are being closed (mitigated/inactive) after reimport.
  • No new findings are created for those vulns.
  • In SARIF Explorer and in DefectDojo API, I can see that the vuln is still present in the new SARIF and the unique_id_from_tool value is identical between fresh and reimport scans.
  • Only hash_code changes between scans.

Expected behavior

With DD_DEDUPLICATION_ALGORITHM_PER_PARSER='{"SARIF":"unique_id_from_tool_or_hash_code"}' I expect:

  • On reimport into the same test:
    • New SARIF result with same unique_id_from_tool should be matched to the existing finding and keep it open.
    • No new finding should be created (since it’s the same vuln).
    • close_old_findings=true should only close findings that have no matching unique_id_from_tool or hash_code in the new SARIF.

In other words: same test + same unique_id_from_tool + different location/hash_code should result in one open finding, not a closed finding and no new one.

Example metadata

Below are 3 findings, with metadata from the initial (fresh) scan and the reimported scan.

Fresh scan metadata:

  • unique_id_from_tool:
    primaryLocationLineHash:2abebf2b9f7e8f07:1|primaryLocationStartColumnFingerprint:14
  • hash_code:
    fd88a20b1bd5ce0674cfa22284f08e7cdb3d3369e58d7969afdd03b01b041fa5
  • unique_id_from_tool:
    primaryLocationLineHash:7c1ccbae89e35318:1|primaryLocationStartColumnFingerprint:13
  • hash_code:
    60caa862d2020e301b64b4f52ae88cfbbce991196ffddd417e50af9274d05980
  • unique_id_from_tool:
    primaryLocationLineHash:db9e4b3bba297e41:1|primaryLocationStartColumnFingerprint:12
  • hash_code:
    f1c6f6c4cccb48980e93b8758e42e48ae2cf7188f75b402fbfc7ce97b146e18c

Reimport scan metadata (same test, same vulns with changed location):

  • unique_id_from_tool:
    primaryLocationLineHash:2abebf2b9f7e8f07:1|primaryLocationStartColumnFingerprint:14
  • hash_code:
    1a49e4cdb4a19cc434a3da0a8a92ac8546487b3043a11f70205167cde9f3a908
  • unique_id_from_tool:
    primaryLocationLineHash:7c1ccbae89e35318:1|primaryLocationStartColumnFingerprint:13
  • hash_code:
    fa51a69946d6090670522afa22b81e56c6b189cdbaf48dda6f1f95caab230a85
  • unique_id_from_tool:
    primaryLocationLineHash:db9e4b3bba297e41:1|primaryLocationStartColumnFingerprint:12
  • hash_code:
    e5c242ce20f8cdf6878235c46e01dc4011853a9979c2cde4af7ecc223b04acbf

For all 3:

  • unique_id_from_tool is identical between fresh and reimport.
  • hash_code is different between fresh and reimport (expected for location change).

Despite this, some of these findings are being closed after reimport and no new finding is created.

Why this seems wrong

According to the deduplication docs and UNIQUE_ID_FROM_TOOL_OR_HASH_CODE semantics, if an incoming finding has the same unique_id_from_tool as an existing one, they should be considered the same logical finding. docs.defectdojo
In a reimport-scan for the same test, that should keep the existing finding open and mark it as “seen”; close_old_findings=true should only close findings that had no matching unique_id_from_tool or hash_code in the new scan. docs.defectdojo

Here, reimport behaves as if the finding was not seen, even though there is a matching unique_id_from_tool in the new SARIF. This looks similar to other reports where valid findings get mitigated on reimport. github

Request

  • Can you please confirm the intended behavior of reimport-scan with UNIQUE_ID_FROM_TOOL_OR_HASH_CODE for SARIF?
  • If my understanding is correct, can this be treated as a bug in the close_old_findings logic for SARIF reimport?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions