Skip to content

New SIG proposal: E2E Model Lifecycle Provenance #40

@marcelamelara

Description

@marcelamelara

@sandlbn and myself would like to propose a new SIG based on the work we've been doing on ML model lifecycle provenance and transparency. We'd welcome feedback on our proposal and interested participants!

Creation of a new Special Interest Group (SIG) at Sandbox stage

End-to-End (E2E) Model Lifecycle Provenance

Proposed focus, intent, goals, and/or deliverables

Focus / Mission

Model signing uses digital signatures as an effective way to detect tampering of ML models after publication to model hubs, but it does not track model transformations at each lifecycle stage before or after the model is published. To capture ML model provenance across its entire lifecycle, we must address several challenges:

  • Heterogeneous pipelines: Each model lifecycle stage runs an ML pipeline using different software dependencies, ML frameworks, execution environments and target platforms. This requires a way to capture heterogeneous information about pipeline configuration, inputs and outputs.
  • Cross-pipeline tracking: Since each lifecycle stage involves different stakeholders, end-to-end provenance must link each individual stage across pipelines in a way that allows for pipeline order and operational metadata to be cryptographically validated.
  • Pipeline integration: Collecting information about pipeline operations and systems requires tight integration with the pipelines, increasing the cost to adoption.

This SIG addresses these challenges by developing a pipeline-agnostic framework for attesting and validating end-to-end model lifecycle provenance. The work in this SIG intends to build upon the Atlas framework for ML lifecycle provenance and transparency and its implementation in the Atlas CLI, which currently supports OMS-compliant C2PA metadata and standard Intel TDX hardware attestation.

Goals

  • Enable pipeline-agnostic attestation of any model transformation that occurs throughout a model’s lifecycle.
  • Facilitate model producer and consumer validation of a model’s attested lineage in order to detect unintended/malicious changes to the expected stages of the lifecycle (e.g., pipelines operating out of order, or being omitted), from initial data processing through model deployment and further refinement.
  • Explore customizable pipeline metadata collection that incrementally addresses various security use cases (e.g., SLSA Build Provenance, additional pipeline run-time logging, fine-grained pipeline software stack, and/or pipeline compute environment attestation)
  • Evaluate different trusted execution environments (TEE) for hardware-based hardening and attestation of model pipelines at run-time.

Deliverables

  • A pipeline-agnostic data format specification for minimum required metadata for model lifecycle provenance.
  • An OMS-compliant APl for collecting customizable pipeline metadata in a standardized attestation format, including optional vendor-agnostic TEE hardware enablement.
  • A set of in-toto compliant templates for common end-to-end model lifecycle integrity validation policies.
  • An OMS-compliant API for consuming and validating end-to-end lifecycle attestations based on a given policy.
  • Talk at 2026 OSS conference (e.g., Open Source Summit, Open Source SecurityCon)
  • Stretch: Prototype implementations of end-to-end ML lifecycle provenance APIs for common pipeline frameworks (e.g., KubeFlow)

Success Metrics

  • Library and tools enable full support of SLSA Build track v1.0 and support for at least two additional levels of incremental pipeline provenance collection
  • Library and tools support GPU-based model artifact hashing
  • Stretch goal: Libraries provide vendor-agnostic support at least 2 TEE hardware configurations
  • Stretch goal: Prototypes available 2 common pipeline frameworks

2026 Roadmap

Quarter Milestone
Q1 2026 Release v0.1 API specification of E2E model lifecycle attestation and validation, including provenance metadata format specification, and key use cases
Q2 2026 Release v1.0 library and tools for E2E lifecycle provenance expanding upon Atlas CLI v0.2 with support for: v0.1 API spec All levels of the SLSA Build track v1.0 Optional pipeline run-time logging Attestation storage in Rekor Provenance validation based on a select in-toto compliant policies Release of v0.1 prototype implementation for KubeFlow pipeline integration
Q3 2026 Deliver talk at industry conference; Release v1.0 API specification of E2E model lifecycle attestation and validation for configurable provenance
Q4 2026 Release v1.1 library and tools for E2E lifecycle provenance with support for: v1.0 API spec Optional finer-grained pipeline software stack attestation Integration with in-toto compliant policy engine ingesting custom policies Configurable, vendor-agnostic TEE hardware attestation validation Release of v0.2 prototype implementation for second pipeline framework (TBD)

Future Directions

  • Provenance attestation beyond pipeline operations (e.g., via SLSA Source track)
  • Model lifecycle metadata collection of higher-order model attributes (e.g., dataset provenance, model cards)

List SIG Lead(s)

The SIG must have a minimum of 1 Lead

  • Marcela Melara, Intel, marcelamelara
  • Marcin Spoczynski, Intel, sandlbn

List of interested individuals

The SIG have a minimum of 3 members with 2 different organizational affiliations.

  • Mihai Maruseac, Google, mihaimaruseac
  • Abdullah Garcia, JP Morgan, abdullahgarcia
  • TBD

Governing Body

SIGs may report to an existing OpenSSF Working Group or directly to the TAC as their governing body. The SIG commits to providing the governing body quarterly updates on progress.

  • "AI/ML Security WG”

SIG References

Reference URL
Repo https://github.com/IntelLabs/atlas-cli
Security.md https://github.com/IntelLabs/atlas-cli/blob/main/SECURITY.md
code-of-conduct.md https://github.com/ossf/ai-ml-security/blob/main/code-of-conduct.md
Demos Planned for Q2/Q3 of 2026
Paper https://arxiv.org/pdf/2502.19567
Open Source Summit NA ‘25 Talk https://www.youtube.com/watch?v=FNpkbOOghe4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions