JIT: Optimize async OSR resumptions #123757
Draft
+201
−162
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously resumption inside OSR methods looked like a normal OSR transition going through the patchpoint helper. However, the patchpoint helper is not cheap and the overhead of this is around 10-20x.
This PR optimizes resumption inside OSR functions by executing a direct non-local jump from the tier0 code into the OSR code. This completely bypasses the patchpoint helper. To do so:
GT_FTN_ENTRYwhich represents the entry point of the current function being compiled. Switch the OSR IL offset stored by OSR methods in the continuation to be this address instead.GT_NONLOCAL_JMP, a unary node which represents a jump to a specified address. Change the async transformation to generate this node in tier0 codegen, so that when an OSR continuation is passed, we just jump to the OSR address.JIT_Patchpointto the OSR function itself by simply pushing any value on the stack on entry.Currently this only works for x64, but the same approach should work on other platforms once #123645 is merged.
The same approach can also be used to remove the transitioning responsibility from
JIT_Patchpoint. The idea would be thatJIT_Patchpointreturns a function address and the tier0 codegen just executes a nonlocal jump to it.Example:
Before: Took 19462.2 ms
After: Took 799.1 ms
Tier0 codegen diff
OSR method diff
Fix #120865