-
Notifications
You must be signed in to change notification settings - Fork 45
Store PRegSet in MachineEnv
#254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Store PRegSet in MachineEnv
#254
Conversation
Helps to make the `MachineEnv`s in Wasmtime `const`-allocatable.
cfallin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch for working on this!
src/ion/reg_traversal.rs
Outdated
| .into_iter() | ||
| .take(offset_hint) | ||
| .collect::<PRegSet>() | ||
| .into_iter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This iterator chain appears to generate some pretty gnarly code that iterates bit-by-bit (Playground link).
Could you instead implement the scheme I mentioned in #252 with some constant-time bit operations?
To implement this, we would need to update the RegTraversalIter's cursors over these lists to hold a bitset, and use a "find first bit" operation to find the index of the first entry. We can implement the shuffling behavior by holding two bitsets (two copies of the index space) and choosing some prefix, removing it from the first copy and putting it in the second copy.
So basically you want to (i) find the bit index in the Bits that denotes the rotation boundary; (ii) build two masks, one with all bits set up to that point and one with bits set after that point; (iii) mask out the two halves, with the "after this point" half coming first. Probably you can do this pretty nicely with some primitives like mask_above_bit, mask_below_bit, and first_bit_set in PRegSet. PRegSet is 256 bits (four limbs of 64 bits) so with some luck LLVM might even unroll the for i in 0..4 loops and turn this into SSE4.2 magic...
(FWIW: I normally wouldn't care so much, but building this cursor happens for every liverange allocation and so is right in the middle of the hottest loop of the hottest part of Cranelift)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the exact semantics of offset_hint matter much? Building a mask of "all bits beneath this over there and above this over here" is much easier than "find the Nth bit set and then build a mask from there". I tried to preserve the preexisting semantics of the latter which I think is at least where some of the bloat is coming from, but if the former is a suitable fit for this use case it'll make building these two sets easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, that's a great point -- it's not first_bit_set but nth_bit_set which is more awkward. The purpose of all of this is to load-balance the probing (so we don't get a heavier BTree on the "first" register and see compile-time slowdowns), so rotating through the cut-position one bit per prog point seems like it should achieve a similar purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok new commit has this disassembly. Alas no vectorization, but everything looks unrolled
Use `offset_hint` as a mask start rather than looking for the Nth bit set. Should in theory have the same overall desired balancing semantics but enables more bit-tricks.
cfallin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Test tweak but otherwise LGTM.
| let p0 = PReg::new(0, Int); | ||
| let p1 = PReg::new(1, Int); | ||
| let p2 = PReg::new(2, Int); | ||
| let p3 = PReg::new(3, Int); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we tweak these test values to exist in the various limbs?
64 regs per class, and class is put in the upper bits, and limbs are 64 bits each, so maybe something like (0, Int), (5, Float), (23, Vector), (63, Vector) or something like that?
Helps to make the
MachineEnvs in Wasmtimeconst-allocatable.Closes #252