-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[clr-interp] Add Swift calling convention support to CoreCLR interpreter on arm64 #123142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[clr-interp] Add Swift calling convention support to CoreCLR interpreter on arm64 #123142
Conversation
…ift calling convention
|
Tagging subscribers to this area: @BrzVlad, @janvorli, @kg |
| and w11, w12, #0xFFFF // Extract offset (lower 16 bits) | ||
| ldr \reg, [x9, x11] // Load from [x9 + offset] | ||
| lsr x12, x12, #16 // Shift to get struct_size | ||
| cbz x12, 1f // If struct_size == 0, skip advance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch can be removed - of x12 is zero, we can still add it and the effect would be the same as with the branch.
Btw, what is the case when the struct size is zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branch controls whether the interpreter stack pointer should advance. For each lowered primitive from the struct, the routine emits after the routine a new slot containing offset (lower 16 bits) | struct size (upper 16 bits). The struct size is non-zero only for the last lowered primitive and makes the stub to advance past the struct on the stack. For all non-last primitives, the struct size is zero, indicating that the interpreter stack pointer should not advance yet because the struct is still being processed.
Let me know if you find this confusing, we can add a new routine for advancing the interpreter stack pointer after the lowered primitives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was not clear in what I meant. My point was the following
cbz x12, 1f
add x9, x9, x12
1:is equivalent to
add x9, x9, x12Because if x12 is zero, the x9 ends up being the same after the add, so the conditional branch is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for Swift calling convention in the CoreCLR interpreter on arm64. Swift uses special registers (x20 for self, x21 for error, x8 for indirect results) and lowers value types into multiple primitive registers. The implementation extends the call stub generator to handle Swift-specific argument and return value marshalling.
Changes:
- Added Swift-specific register handling (SwiftSelf, SwiftError, SwiftIndirectResult)
- Implemented value type lowering for arguments and return values
- Extended CallStubHeader to support post-call routines for lowered returns
- Added assembly routines for loading/storing Swift arguments and returns
- Disabled UnmanagedCallersOnly tests for interpreter (out of scope)
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/callstubgenerator.h | Added TargetSlotIndex to CallStubHeader, Swift routine types, and return lowering state |
| src/coreclr/vm/callstubgenerator.cpp | Implemented Swift calling convention detection, type checking, argument/return lowering |
| src/coreclr/vm/arm64/asmhelpers.asm | Added Swift-specific load/store routines, error handling, and lowered return stub |
| src/coreclr/vm/arm64/asmhelpers.S | Unix assembly equivalents of Windows assembly changes |
| src/tests/Interop/Swift/*.cs | Added ActiveIssue attributes to skip UnmanagedCallersOnly tests |
Comments suppressed due to low confidence (1)
src/coreclr/vm/callstubgenerator.cpp:1
- The magic number '10' in the calculation lacks explanation. The comment mentions '(up to 4 elements * 2 slots + terminator = 9 slots)' which equals 9, not 10. Either the comment or the calculation is incorrect. If the +1 is for safety margin or another purpose, it should be documented.
// Licensed to the .NET Foundation under one or more agreements.
src/coreclr/vm/arm64/asmhelpers.asm
Outdated
| blr x11 | ||
| ldr x11, [fp, #40] | ||
| cbz x11, CallJittedMethodRetVoid_NoSwiftError | ||
| str x10, [x11] |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the Swift function call, x21 should contain the error value (not x10). The code saves x21 to x10 after the call (line 750), but then stores x10 to the error location. However, the error slot address is in x11 (loaded from [fp, #40]), and we should be storing x21 (not x10) to that location. The assembly at line 2781 shows 'str x10, [x11]' but it should store the error value from x21, not x10 which is a scratch register.
| bool isFloat; // True if this element goes in FP register | ||
| bool isLowered; // True if this is part of a lowered struct (not a regular arg) | ||
| }; | ||
| SwiftLoweringElement swiftLoweringInfo[128]; |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixed-size array of 128 elements lacks bounds checking before use. While unlikely to overflow in practice, accessing swiftLoweringInfo[swiftLoweringCount] at lines 3113, 3170, 3184 without verifying swiftLoweringCount < 128 could cause buffer overflow. Consider adding an assertion or check.
| SwiftLoweringElement swiftLoweringInfo[128]; | |
| struct SwiftLoweringBuffer | |
| { | |
| SwiftLoweringElement elements[128]; | |
| SwiftLoweringElement& operator[](int index) | |
| { | |
| _ASSERTE(index >= 0); | |
| _ASSERTE(index < static_cast<int>(ARRAY_SIZE(elements))); | |
| return elements[index]; | |
| } | |
| const SwiftLoweringElement& operator[](int index) const | |
| { | |
| _ASSERTE(index >= 0); | |
| _ASSERTE(index < static_cast<int>(ARRAY_SIZE(elements))); | |
| return elements[index]; | |
| } | |
| }; | |
| SwiftLoweringBuffer swiftLoweringInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the 128 limit represent? How is it enforced? If it is a natural limit based on ECMA or some other spec, we should make that comment. As such it isn't obvious what stops this from hitting the issue Copilot is calling out.
| // IR bytecode address | ||
| mov x19, METHODDESC_REGISTER | ||
|
|
||
| #ifdef TARGET_APPLE |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from x21 to x22 for saving x0 on Apple platforms is necessary because x21 is now reserved for SwiftError. However, this change should be documented in a comment to explain why x22 is used instead of x21, especially since x21 was the original choice. This would help future maintainers understand the Swift calling convention constraints.
| #ifdef TARGET_APPLE | |
| #ifdef TARGET_APPLE | |
| // On Apple, x21 is reserved for SwiftError in the Swift calling convention, so use x22 to save x0. |
| { | ||
| if (!m_interpreterToNative) | ||
| { | ||
| COMPlusThrow(kNotImplementedException); |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message for unsupported Swift native-to-interpreter calls is not helpful. It should explicitly state that Swift calling convention is only supported for interpreter-to-native calls, not the reverse direction. This would help developers understand the limitation.
| COMPlusThrow(kNotImplementedException); | |
| COMPlusThrowNonLocalized( | |
| kNotImplementedException, | |
| W("Swift calling convention is only supported for interpreter-to-native calls; native-to-interpreter Swift calls are not supported.")); |
| PROLOG_SAVE_REG_PAIR fp, lr, #-224! | ||
| SAVE_ARGUMENT_REGISTERS sp, 16 | ||
| SAVE_FLOAT_ARGUMENT_REGISTERS sp, 96 | ||
| str x8, [sp, #208] ; Save x8 (indirect result register for Swift) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The x8 register is saved and restored in PInvokeImportThunk, but there's no corresponding change in InterpreterStub to preserve x8 across GetInterpThreadContextWithPossiblyMissingThreadOrCallStub. If Swift indirect results are passed through InterpreterStub, x8 could be clobbered. Verify that x8 preservation is complete for all Swift call paths.
The SwiftError argument is a pointer (ELEMENT_TYPE_PTR), so it needs to be handled differently. Also fixes the register the asm helpers store as the swift error. The correct register is x21 while the code was using x10.
| PCODE CallStubGenerator::GetSwiftSelfRoutine() | ||
| { | ||
| #if LOG_COMPUTE_CALL_STUB | ||
| printf("GetSwiftSelfRoutine\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use one of our existing logging mechanisms. See the LOG() and LOGGING defines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also remove this if it no longer has value. Unclear if this is difficult to determine at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should convert it to the LOG mechanism. When I wrote these logging statements, I had thought they would likely be very temporary, but I've used them 4-5 times since then, so they have some ongoing value. However, I'd like to see that work done in a separate PR. We should convert them to using LOG statements probably with a new LF enum value. Probably what should be done is add a new log facility enum in log.h, call it LF2_INTERPRETER. And convert these log statements to use the LOG2 macro, with info level LL_INFO10000 instead of wrapping them in #if LOG_COMPUTE_CALL_STUB blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #123272 to ask copilot to do this, let's see what it comes up with.
| (PCODE)Load_X0_AtOffset, (PCODE)Load_X1_AtOffset, (PCODE)Load_X2_AtOffset, (PCODE)Load_X3_AtOffset, | ||
| (PCODE)Load_X4_AtOffset, (PCODE)Load_X5_AtOffset, (PCODE)Load_X6_AtOffset, (PCODE)Load_X7_AtOffset | ||
| }; | ||
| _ASSERTE(regIndex >= 0 && regIndex < 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| _ASSERTE(regIndex >= 0 && regIndex < 8); | |
| _ASSERTE(regIndex >= 0 && regIndex < ARRAY_SIZE(routines)); |
| (PCODE)Load_D0_AtOffset, (PCODE)Load_D1_AtOffset, (PCODE)Load_D2_AtOffset, (PCODE)Load_D3_AtOffset, | ||
| (PCODE)Load_D4_AtOffset, (PCODE)Load_D5_AtOffset, (PCODE)Load_D6_AtOffset, (PCODE)Load_D7_AtOffset | ||
| }; | ||
| _ASSERTE(regIndex >= 0 && regIndex < 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| _ASSERTE(regIndex >= 0 && regIndex < 8); | |
| _ASSERTE(regIndex >= 0 && regIndex < ARRAY_SIZE(routines)); |
| static PCODE routines[] = { | ||
| (PCODE)Store_X0_AtOffset, (PCODE)Store_X1_AtOffset, (PCODE)Store_X2_AtOffset, (PCODE)Store_X3_AtOffset | ||
| }; | ||
| _ASSERTE(regIndex >= 0 && regIndex < 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| _ASSERTE(regIndex >= 0 && regIndex < 4); | |
| _ASSERTE(regIndex >= 0 && regIndex < ARRAY_SIZE(routines)); |
| static PCODE routines[] = { | ||
| (PCODE)Store_D0_AtOffset, (PCODE)Store_D1_AtOffset, (PCODE)Store_D2_AtOffset, (PCODE)Store_D3_AtOffset | ||
| }; | ||
| _ASSERTE(regIndex >= 0 && regIndex < 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| _ASSERTE(regIndex >= 0 && regIndex < 4); | |
| _ASSERTE(regIndex >= 0 && regIndex < ARRAY_SIZE(routines)); |
| // true if the given type is SwiftSelf, | ||
| // false otherwise. | ||
| // | ||
| bool isSwiftSelfType(MethodTable* pMT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are method are considered hot, then we might consider using CoreLibBinder::GetClass() and making the types known instead of doing the metadata string look-up and compares.
| // The size of the routines array is three times the number of arguments plus one slot for the target method pointer. | ||
| return sizeof(CallStubHeader) + ((numArgs + 1) * 3 + 1) * sizeof(PCODE); | ||
| // Add extra space for Swift return lowering (up to 4 elements * 2 slots + terminator = 9 slots). | ||
| return sizeof(CallStubHeader) + ((numArgs + 1) * 3 + 10) * sizeof(PCODE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return sizeof(CallStubHeader) + ((numArgs + 1) * 3 + 10) * sizeof(PCODE); | |
| return sizeof(CallStubHeader) + ((numArgs + 1) * 3 + 1 + 9) * sizeof(PCODE); |
There is a comment that calls out 1 and a comment that calls our 9, I typically like to see those numbers reflected in the code.
| PROLOG_SAVE_REG_PAIR fp, lr, #-224! | ||
| SAVE_ARGUMENT_REGISTERS sp, 16 | ||
| SAVE_FLOAT_ARGUMENT_REGISTERS sp, 96 | ||
| str x8, [sp, #208] ; Save x8 (indirect result register for Swift) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is Windows only, we should not have any swift changes here.
| } | ||
|
|
||
| #if defined(TARGET_APPLE) && defined(TARGET_ARM64) | ||
| // Swift lowering info for expanded struct elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to put this new stuff into a separate method instead of growing this already large method further.
Fix extra paren
Description
Swift uses a calling convention that differs from the standard arm64 ABI. This PR extends the CoreCLR interpreter so that it can correctly call Swift functions on arm64. The changes are limited to call stub generation and includes Swift-specific register usage, argument lowering, and return value lowering.
SwiftSelf and SwiftSelf<T>
Swift instance methods pass
selfin a dedicated register (x20on arm64) rather than as a regular argument. InComputeCallStubWorkerwhen SwiftSelf in encountered, it emits routine for loading argument intox20before the call.When SwiftSelf<T> is encountered and not lowered according to the Swift value type lowering, it emits routine for loading address of struct into
x20before the call. If struct T can be lowered, it is processed as normal lowered argument as described below.SwiftError
Swift error is returned via dedicated register (
x21on arm64) when a call returns. It is passed as byref argument where is expected to load error reg into that bytrerf arg when call returns. It is done by:SwiftIndirectResult
Swift returns non-lowered structs indirectly via a dedicated register (
x8on arm64). InComputeCallStubWorkerwhen SwiftIndirectResult in encountered, it emits routine for loading argument intox8before the call. Then the callee writes the return value to the memory provided in the argument.Lowering of value types in arguments
Swift lowers structs into individual primitive arguments when they fit within four machine words. Consider the following struct:
A Swift function
swiftFunc(int a, F3_S4 b, float c)is lowered toswiftFunc(int a, nuint b_0, float b_1, ushort b_2, float c)The signature is first rewritten into lowered form. This enables correct register and stack allocation, but introduces a problem. When loading lowered arguments into registers or stack, the standard routines assume 8-byte alignment on the interpreter stack, which is not always true for fields within lowered structs.
For example, the loader routine reads
F0into a register and advances the argument pointer by 8 bytes. It then loadsF1and again advances by 8 bytes (including 4 bytes of padding). This is incorrect becauseF1is at offset 8, andF2is at offset 12To fix that, new load routines are introduced for Swift-lowered arguments:
[x9 + offset]without advancing the interpreter argument pointerLowering of return value types
Swift lowers struct return values into multiple registers (both GP and FP). This is done by collecting return registers after the call and reconstructing the original struct layout. When a lowered struct return is detected in
ComputeCallStubWorker, additional post-call routines are emitted to load the return value into the managed return buffer. To implement this, a new return stubInterpreterStubRetSwiftLoweredis introduced. This stub stores the return registers (GP0–GP4 and FP0–FP4) into a contiguous temporary buffer immediately after the call returns.A new call stub
CallJittedMethodRetSwiftLoweredis introduced to allow for execution of routines after the target call. Based on the computed lowering, a sequence of post-call routines is emitted. These routines copy each lowered field from the temporary buffer into the return struct at the correct offsets. ASwiftLoweredReturnTerminatorroutine is added at the end to branch back to the epilog.Out of scope
x64andUnmanagedCallersOnlyare currently not supported and will be added in a follow-up.Contributes to #120049