Skip to content

Improve random() to use name-based salting for order-independent reproducibility#434

Closed
baogorek wants to merge 4 commits intomasterfrom
name-based-random-salting
Closed

Improve random() to use name-based salting for order-independent reproducibility#434
baogorek wants to merge 4 commits intomasterfrom
name-based-random-salting

Conversation

@baogorek
Copy link
Collaborator

@baogorek baogorek commented Feb 4, 2026

I've done a complete 180 on the random function. I now believe it should be the bedrock of randomness. The current version is pretty close but fragile in some ways that are easily fixed; hence this PR.

A separate PR in -us is coming that shows the new pattern using randomness and "seed" variables, renamed to "draw".

With the current random(), if you add a new variable that uses random() → every subsequent variable's random values shift. This PR uses an entity id plus variable "salt" pattern that makes random values stable per id and variable - adding/removing/reordering variables doesn't affect other variables' random draws.

Summary

  • Replace global execution counter with variable-name-based salting
  • Add _stable_string_hash() for deterministic hashing across Python processes
  • Add optional salt parameter for use outside formula context
  • Prevents the "ripple effect" where adding/removing/reordering variables changes random values for all subsequent variables

Fixes #433, #363, #412

Breaking Change

This will change random values for all existing simulations using random(). The old seed formula was:

seeds = np.abs(entity_ids * 100 + count_random_calls)

The new formula is:

base_seed = stable_hash(f"{variable_name}:{call_count}:{salt}")
seeds = entity_ids ^ base_seed

Test plan

  • All existing tests pass (476 passed)
  • New tests verify order independence
  • New tests verify reproducibility (same variable + entity = same value)
  • New tests verify multiple calls within same formula produce different values
  • New tests verify salt parameter works outside formula context

🤖 Generated with Claude Code

baogorek and others added 4 commits February 4, 2026 14:45
…oducibility

Replace the global execution counter with variable-name-based salting to
prevent the "ripple effect" where adding, removing, or reordering variables
changes random values for ALL subsequent variables.

Changes:
- Add _stable_string_hash() for deterministic hashing across Python processes
- Modify random() to use tracer stack variable name instead of global counter
- Add optional salt parameter for use outside formula context
- Update tests to verify order independence and reproducibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The local black version (25.12.0) formats differently than CI's black.
Reverting this unrelated file to avoid CI lint failure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@baogorek
Copy link
Collaborator Author

baogorek commented Feb 5, 2026

If random is going to stay in core, I'd recommend going forward with this, but the current mandate is to put the randomness closer to the data. All is not lost however; these ideas are going into -us-data and they'll probably be abstracted away to a -data package in the future: PolicyEngine/policyengine-us-data#451.

@baogorek baogorek closed this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve random() to use name-based salting for order-independent reproducibility

1 participant