-
Notifications
You must be signed in to change notification settings - Fork 8
Record subset of meta in provenance #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a DerivationProvenance type, wires derivation meta JSON into provenance chains during derivation construction, registers/serializes the new provenance type, updates provenance-show rendering to display derivation metadata and licenses, exposes headers/sources in libexpr build, and updates tests/configs to expect meta-wrapped provenance. Changes
Sequence DiagramsequenceDiagram
participant EvalState as EvalState
participant PrimOps as PrimOps
participant JSON as JSONSerializer
participant Provenance as Provenance
EvalState->>PrimOps: derivationStrict(state, args, state.provenance)
PrimOps->>PrimOps: detect meta in args / langVersion check
PrimOps->>JSON: printValueAsJSON(meta)
JSON-->>PrimOps: json_meta
PrimOps->>PrimOps: prune identifiers/licenses from json_meta
PrimOps->>Provenance: create DerivationProvenance(next=state.provenance, meta=json_meta)
Provenance-->>PrimOps: wrapped provenance
PrimOps->>PrimOps: call writeDerivation(..., provenance=wrapped)
PrimOps->>Provenance: provenance->to_json() for storage/display
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/functional/flakes/provenance.sh (1)
107-137:⚠️ Potential issue | 🔴 CriticalCopied provenance is missing the
metalayer — test will likely fail.The first test (lines 19–61) asserts that build provenance now follows the chain
build → meta → flake → subpath → tree. However, this "copied" provenance test still expects the old chaincopied → build → flake → subpath → treewithout themetanode. Sincenix copywraps the original provenance with a"copied"envelope, the inner structure should still contain themetalayer.Suggested fix: insert the meta layer into the copied provenance expectation
{ "from": "file://$binaryCache", "next": { "drv": "$(basename "$drvPath")", "next": { - "flakeOutput": "packages.$system.default", + "meta": { + "license": [ + { + "deprecated": true, + "free": true, + "fullName": "GNU Lesser General Public License v2.1", + "redistributable": true, + "shortName": "lgpl21", + "spdxId": "LGPL-2.1", + "url": "https://spdx.org/licenses/LGPL-2.1.html" + } + ] + }, "next": { - "next": { - "attrs": { - "lastModified": $lastModified, - "ref": "refs/heads/master", - "rev": "$rev", - "revCount": 1, - "type": "git", - "url": "file://$flake1Dir" + "flakeOutput": "packages.$system.default", + "next": { + "next": { + "attrs": { + "lastModified": $lastModified, + "ref": "refs/heads/master", + "rev": "$rev", + "revCount": 1, + "type": "git", + "url": "file://$flake1Dir" + }, + "type": "tree" }, - "type": "tree" + "subpath": "/flake.nix", + "type": "subpath" }, - "subpath": "/flake.nix", - "type": "subpath" + "type": "flake" }, - "type": "flake" + "type": "meta" }, "output": "out", "type": "build" }, "type": "copied" }
🤖 Fix all issues with AI agents
In `@src/libexpr/primops.cc`:
- Around line 1577-1594: The new EvalState::s.meta.getId() branch
unconditionally requires experimentalFeatureSettings.require(Xp::Provenance) and
assumes printValueAsJSON returns an object, breaking existing derivations and
causing uncaught exceptions for non-object meta values; change the logic so you
only enforce Provenance and construct MetaProvenance when the feature is
enabled, otherwise fall through to the default handling (so old behavior
remains), and before iterating call printValueAsJSON and verify the returned
nlohmann::json is an object (e.g., is_object()); if it's not an object, either
forward to the default case or raise a clear Nix evaluation error instead of
calling it.key(), then only create MetaProvenance and assign provenance when the
JSON check passes (using symbols: EvalState::s.meta.getId(),
experimentalFeatureSettings.require, printValueAsJSON, MetaProvenance,
provenance).
🧹 Nitpick comments (1)
src/libexpr/primops.cc (1)
5443-5450: Bumping__langVersionbased on an experimental feature conflates language version with feature flags.
__langVersiontraditionally reflects language syntax/semantics evolution. Tying it to whetherXp::Provenanceis enabled means the same Nix binary reports different language versions depending on runtime config, which may confuse downstream consumers that uselangVersionfor feature detection of non-provenance language features. Consider whether a separatebuiltins.provenanceorbuiltins ? provenanceMetacheck would be a cleaner detection mechanism for Nixpkgs.
src/libexpr/primops.cc
Outdated
| case EvalState::s.meta.getId(): | ||
| experimentalFeatureSettings.require(Xp::Provenance); | ||
|
|
||
| { | ||
| auto meta = printValueAsJSON(state, true, *i->value, pos, context); | ||
|
|
||
| for (auto it = meta.begin(); it != meta.end();) { | ||
| if (it.key() == "identifiers" || it.key() == "license" || it.key() == "licenses") { | ||
| it++; | ||
| continue; | ||
| } | ||
|
|
||
| it = meta.erase(it); | ||
| } | ||
|
|
||
| provenance = std::make_shared<const MetaProvenance>(provenance, make_ref<nlohmann::json>(meta)); | ||
| } | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for EvalState::s.meta symbol definitions and usage patterns
echo "=== Searching for Symbol meta definitions in EvalState ==="
rg -n 'Symbol.*meta' --type cpp | head -20
echo ""
echo "=== Searching for s.meta usage patterns ==="
rg -n '\bs\.meta\b' --type cpp -C2 | head -50
echo ""
echo "=== Check derivation function for meta attribute handling ==="
rg -n 'meta.*getId\(\)|s\.meta' --type cpp -B3 -A3 | head -40
echo ""
echo "=== Look for pre-existing meta case in switch statements ==="
rg -n 'case.*meta\b|s\.meta.*case' --type cpp -C1 | head -30
echo ""
echo "=== Search for EvalState symbol initialization ==="
rg -n 'EvalState::s\.' src/libexpr/ --type cpp | grep -E '(Symbol|meta)' | head -20Repository: DeterminateSystems/nix-src
Length of output: 6844
🏁 Script executed:
#!/bin/bash
# Check the derivation function context around line 1577
echo "=== Context around line 1577 in primops.cc ==="
sed -n '1500,1600p' src/libexpr/primops.cc | head -120
echo ""
echo "=== Find the function that contains this switch statement ==="
rg -n 'static.*void.*derivation|derivationInternal\|derivation_impl' src/libexpr/primops.cc -A5 | head -20
echo ""
echo "=== Check what happens with meta before line 1577 (before this new case) ==="
sed -n '1540,1577p' src/libexpr/primops.ccRepository: DeterminateSystems/nix-src
Length of output: 7275
🏁 Script executed:
#!/bin/bash
# Get the complete switch statement and default case
echo "=== Complete switch statement from line 1558 onwards ==="
sed -n '1558,1750p' src/libexpr/primops.cc
echo ""
echo "=== Check the default case handling ==="
sed -n '1700,1750p' src/libexpr/primops.ccRepository: DeterminateSystems/nix-src
Length of output: 11637
🏁 Script executed:
#!/bin/bash
# Check test configuration and meta usage in tests
echo "=== Check config.nix.in for meta handling ==="
if [ -f tests/lang/config.nix.in ]; then
cat tests/lang/config.nix.in | head -50
fi
echo ""
echo "=== Search for test files with meta attributes ==="
rg -l 'meta\s*=' tests/lang/ --type nix | head -10
echo ""
echo "=== Check simple.nix mentioned in review ==="
if [ -f tests/lang/simple.nix ]; then
head -40 tests/lang/simple.nix
fi
echo ""
echo "=== Check for Provenance feature guard conditions ==="
rg -n 'Xp::Provenance|require.*Provenance' src/libexpr/ --type cpp -B2 -A2 | head -30Repository: DeterminateSystems/nix-src
Length of output: 1070
🏁 Script executed:
#!/bin/bash
# Check printValueAsJSON return type and JSON handling
echo "=== Search for printValueAsJSON function signature ==="
rg -n 'nlohmann::json.*printValueAsJSON' src/libexpr/ --type cpp -B2 -A5 | head -40
echo ""
echo "=== Check how printValueAsJSON handles different value types ==="
rg -n 'printValueAsJSON' src/libexpr/primops.cc -B2 -A10 | grep -A10 'case.*meta'
echo ""
echo "=== Verify if JSON is guaranteed to be object for meta ==="
rg -n 'printValueAsJSON.*meta' src/libexpr/ --type cpp -C5
echo ""
echo "=== Check if there's type checking before it.key() ==="
sed -n '1580,1595p' src/libexpr/primops.ccRepository: DeterminateSystems/nix-src
Length of output: 1605
🏁 Script executed:
#!/bin/bash
# Check printValueAsJSON implementation for different value types
echo "=== Check printValueAsJSON implementation ==="
rg -n 'nlohmann::json printValueAsJSON' src/libexpr/value-to-json.cc -A30 | head -50
echo ""
echo "=== Check if JSON type is validated before iteration ==="
rg -n 'is_object\|is_array\|type\(\)' src/libexpr/value-to-json.cc | head -20
echo ""
echo "=== Look for any tests of meta as non-attrset ==="
rg -n 'meta.*string\|meta.*int\|meta.*list' . --type nix | head -10
echo ""
echo "=== Check nlohmann::json begin() behavior on non-objects ==="
python3 << 'EOF'
# Check nlohmann::json documentation/behavior
print("nlohmann::json iterator behavior:")
print("- For objects: iterator over key-value pairs")
print("- For non-objects (arrays, strings, numbers): begin() returns begin iterator")
print("- Calling .key() on non-object iterator is undefined behavior/exception")
print("")
print("The code at lines 1582-1590 does:")
print(" for (auto it = meta.begin(); it != meta.end();)")
print(" if (it.key() == ...)")
print("")
print("This assumes meta is always an object (JSON object type).")
print("If meta = 'foo' (string), printValueAsJSON returns JSON string,")
print("and calling it.key() would cause issues.")
EOFRepository: DeterminateSystems/nix-src
Length of output: 790
Intercepting meta unconditionally breaks existing derivations and lacks type safety.
This new case EvalState::s.meta.getId(): in the derivation builder (line 1577) intercepts all derivation calls containing a meta attribute. Before this change, meta was silently passed through to the derivation environment (via the default case). Now it unconditionally requires experimentalFeatureSettings.require(Xp::Provenance), which means:
-
Any derivation with a
metaattribute will fail if the Provenance feature is not enabled. This is a breaking change for existing code, including any tests or external derivations that includemeta. -
The code assumes
metais always a JSON object and iterates withit.key()without type checking. If a user passes a non-attrsetmeta(e.g.,meta = "foo";ormeta = [1 2];),printValueAsJSONwill return a non-object JSON, causingit.key()to throw an unhelpful exception rather than a clear Nix evaluation error.
🤖 Prompt for AI Agents
In `@src/libexpr/primops.cc` around lines 1577 - 1594, The new
EvalState::s.meta.getId() branch unconditionally requires
experimentalFeatureSettings.require(Xp::Provenance) and assumes printValueAsJSON
returns an object, breaking existing derivations and causing uncaught exceptions
for non-object meta values; change the logic so you only enforce Provenance and
construct MetaProvenance when the feature is enabled, otherwise fall through to
the default handling (so old behavior remains), and before iterating call
printValueAsJSON and verify the returned nlohmann::json is an object (e.g.,
is_object()); if it's not an object, either forward to the default case or raise
a clear Nix evaluation error instead of calling it.key(), then only create
MetaProvenance and assign provenance when the JSON check passes (using symbols:
EvalState::s.meta.getId(), experimentalFeatureSettings.require,
printValueAsJSON, MetaProvenance, provenance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metashould be able to be deserialized into a JSON object 99% of the time, if it doesn't then there's a problemmetahas never been passed into a derivation due to structured attrs not being enabled by default- The cool thing with this PR is it only affects how
metais passed into the derivation serialization, nix expressions still see the fullmeta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RossComputerGuy Thank you for the clarifications! Those are important points that address my concerns:
-
Not a breaking change: Your point about
metanever being passed into derivations before (due to the structured attrs and config.nix removal logic) resolves my main concern about breaking existing code. If it wasn't serialized into derivations previously, then requiring the Provenance feature here is indeed a net-new behavior, not a breaking change. -
JSON object assumption: I understand that
metashould be an attrset in practice. Given that this is new functionality gated behind an experimental feature, the assumption is reasonable for the target use case. -
Serialization-only impact: The fact that Nix expressions still see the full
metaand only the internal derivation serialization is affected makes the scope much clearer.
Given this context, my initial characterization was too strong. That said, as a best practice for robustness, you might consider adding a type check before the iteration (e.g., if (!meta.is_object()) throw EvalError(...)) to provide a clearer error message in that 1% edge case where someone might pass a non-attrset meta. But this is a "nice-to-have" rather than critical, especially since it's behind an experimental feature flag.
Thank you for taking the time to explain the background!
🧠 Learnings used
Learnt from: edolstra
Repo: DeterminateSystems/nix-src PR: 321
File: src/libstore/build/derivation-building-goal.cc:443-447
Timestamp: 2026-01-23T12:00:50.308Z
Learning: In the Nix build system, for the `AlreadyValid` case in derivation building (when outputs already exist and are valid), provenance of the current derivation should NOT be returned because the existing outputs may have been built with different provenance than the current derivation, making it misleading to associate them.
e2a917c to
ee14c7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/libexpr/primops.cc`:
- Around line 1577-1594: The meta branch calls printValueAsJSON which appends
string contexts into the shared context but then breaks, so the meta JSON never
gets added to the derivation (drv) and its string contexts become phantom
inputs; after creating provenance (MetaProvenance) in the EvalState::s.meta
case, explicitly add the filtered meta JSON into the derivation the same way the
default handler does (e.g., push the meta into drv.structuredAttrs and/or set
the appropriate drv.env entries) or invoke the common handling path for
structured values so that meta itself is recorded in drv and its string contexts
are correctly tied to actual drv inputs; reference symbols: EvalState::s.meta,
printValueAsJSON, MetaProvenance, provenance, drv.structuredAttrs, drv.env, and
context when making this change.
In `@tests/functional/flakes/provenance.sh`:
- Around line 167-189: Update the stale comment that currently states "--impure
does not add provenance" to reflect the new behavior: --impure no longer adds
full build/flake provenance but still records a meta provenance node (look for
the assertion expecting a JSON with "type": "meta" and "next": null). Also
normalize the heredoc to match the rest of the file by changing the occurrence
of "<< EOF" to "<<EOF" so the test style is consistent. Ensure the updated
comment and the heredoc token are edited in the test block surrounding the
--impure check and the JSON assertion.
🧹 Nitpick comments (2)
src/nix/provenance.cc (1)
95-97: Consider displaying a summary of the metadata content.The current output
"← with metadata"is opaque — users have no indication of what metadata is attached. Other provenance types display key details (e.g., the derivation path forBuildProvenance, the URL forFlakeProvenance). You could show which metadata fields are present (e.g.,"← with metadata (license, identifiers)") to makenix provenance showmore informative.That said, if the intent is to keep it minimal for now and iterate later, this is fine.
src/libexpr/primops.cc (1)
5443-5450: Coupling__langVersionto an experimental feature flag is unconventional — consider the stabilization path.
__langVersionis typically a monotonically increasing constant that reflects language-level changes available to all users. Tying it toXp::Provenancemeans that the same Nix binary reports different language versions depending on configuration, which could confuse tools or caches that key on__langVersion.When Provenance is stabilized (or if it's removed), this branching needs to be reconciled — either version 7 becomes the permanent value, or a different signaling mechanism is used. The current approach works as a short-term experiment, but consider documenting this caveat or using a different mechanism (e.g., a
builtins.provenanceEnabledflag) if this persists beyond the experimental phase.
| * Provenance indicating that this store path was instantiated by the `derivation` builtin function. Its main purpose is | ||
| * to record `meta` fields. | ||
| */ | ||
| struct MetaProvenance : Provenance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this DerivationProvenance. MetaProvenance is pretty vague.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/libexpr/primops.cc
Outdated
| experimentalFeatureSettings.require(Xp::ImpureDerivations); | ||
| } | ||
| break; | ||
| case EvalState::s.meta.getId(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this to __meta, similar to __structuredAttrs, to denote that it's a special attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that seems to break things. __meta isn't being removed from the serialization so it ends up wanting __structuredAttrs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the provenance is just not showing up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to allocate the symbol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/libexpr/primops.cc
Outdated
| } | ||
| break; | ||
| case EvalState::s.meta.getId(): | ||
| experimentalFeatureSettings.require(Xp::Provenance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should just ignore the meta attribute if provenance is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/libexpr/primops.cc
Outdated
| experimentalFeatureSettings.require(Xp::Provenance); | ||
|
|
||
| { | ||
| auto meta = printValueAsJSON(state, true, *i->value, pos, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit inefficient to serialize attributes that we're not interested in, so for performance it's better to iterate over the meta attrset and only serialize the attrs that we care about.
We should measure what the performance impact is on something like nix flake show nixpkgs --legacy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure how the evaluator works from the C++ API. I know from the C API though how to use it. Could use a bit of guidance on how to iterate through the attr set.
src/nix/provenance.cc
Outdated
| logger->cout("← from file " ANSI_BOLD "%s" ANSI_NORMAL, subpath->subpath.abs()); | ||
| provenance = subpath->next; | ||
| } else if (auto meta = std::dynamic_pointer_cast<const MetaProvenance>(provenance)) { | ||
| logger->cout("← with metadata"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should print the meta attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume pretty print it and not just shove the JSON to the logger.
Example:
/nix/store/j1zlwj1j8q3awm8jl9friy3xih7njzwf-aaa.drv
← with derivation metadata
license: lgpl3Only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/nix/store/nvmynjjgaln144r8vy2y0m5km2fj8m3i-bash-interactive-5.3p9.drv
← with derivation metadata
CPE: cpe:2.3:a:gnu:bash:5.3:9:*:*:*:*:*:*
License: gpl3Plus
/nix/store/lfqpgrr022iyxrs952i7376i2nw4nnw1-llvm-21.1.8.drv
← with derivation metadata
CPE: cpe:2.3:a:llvm:*:*:*:*:*:*:*:*:*
Licenses:
- ncsa
- asl20
- llvm-exception
Done
02837c0 to
bccd6a4
Compare
Motivation
Adds a subset of
metainto the provenance when the provenance experimental feature is used. So far, just recordidentifiers,license, andlicenses. We may want to expand this further but for now, this will allow us to judge how it affects the narinfo size.Context
Summary by CodeRabbit
New Features
Behavior
Tests
Documentation