Skip to content

Conversation

@kotlarmilos
Copy link
Member

@kotlarmilos kotlarmilos commented Jan 13, 2026

Description

Swift uses a calling convention that differs from the standard arm64 ABI. This PR extends the CoreCLR interpreter so that it can correctly call Swift functions on arm64. The changes are limited to call stub generation and includes Swift-specific register usage, argument lowering, and return value lowering.

SwiftSelf and SwiftSelf<T>

Swift instance methods pass self in a dedicated register (x20 on arm64) rather than as a regular argument. In ComputeCallStubWorker when SwiftSelf in encountered, it emits routine for loading argument into x20 before the call.

When SwiftSelf<T> is encountered and not lowered according to the Swift value type lowering, it emits routine for loading address of struct into x20 before the call. If struct T can be lowered, it is processed as normal lowered argument as described below.

SwiftError

Swift error is returned via dedicated register (x21 on arm64) when a call returns. It is passed as byref argument where is expected to load error reg into that bytrerf arg when call returns. It is done by:

  • The byref argument address is saved at fp + offset
  • Before the call, both x21 and the byref slot are initialized to null
  • After the call, x21 is copied into scratch register x10
  • If x10 is non-null, it points to a Swift error object
  • The interpreter checks x10 and loads the error into the argument

SwiftIndirectResult

Swift returns non-lowered structs indirectly via a dedicated register (x8 on arm64). In ComputeCallStubWorker when SwiftIndirectResult in encountered, it emits routine for loading argument into x8 before the call. Then the callee writes the return value to the memory provided in the argument.

Lowering of value types in arguments

Swift lowers structs into individual primitive arguments when they fit within four machine words. Consider the following struct:

[StructLayout(LayoutKind.Sequential, Size = 14)]
struct F3_S4
{
    public nuint F0;  // size = 8, offset = 0
    public float F1;  // size = 4, offset = 8
    public ushort F2; // size = 2, offset = 12
}

A Swift function swiftFunc(int a, F3_S4 b, float c) is lowered to swiftFunc(int a, nuint b_0, float b_1, ushort b_2, float c)

The signature is first rewritten into lowered form. This enables correct register and stack allocation, but introduces a problem. When loading lowered arguments into registers or stack, the standard routines assume 8-byte alignment on the interpreter stack, which is not always true for fields within lowered structs.

For example, the loader routine reads F0 into a register and advances the argument pointer by 8 bytes. It then loads F1 and again advances by 8 bytes (including 4 bytes of padding). This is incorrect because F1 is at offset 8, and F2 is at offset 12

To fix that, new load routines are introduced for Swift-lowered arguments:

  • Routines read the field offset from the routine array
  • They load from [x9 + offset] without advancing the interpreter argument pointer
  • The argument pointer is advanced by the struct size only when the last element of the lowered struct is loaded

Lowering of return value types

Swift lowers struct return values into multiple registers (both GP and FP). This is done by collecting return registers after the call and reconstructing the original struct layout. When a lowered struct return is detected in ComputeCallStubWorker, additional post-call routines are emitted to load the return value into the managed return buffer. To implement this, a new return stub InterpreterStubRetSwiftLowered is introduced. This stub stores the return registers (GP0–GP4 and FP0–FP4) into a contiguous temporary buffer immediately after the call returns.

A new call stub CallJittedMethodRetSwiftLowered is introduced to allow for execution of routines after the target call. Based on the computed lowering, a sequence of post-call routines is emitted. These routines copy each lowered field from the temporary buffer into the return struct at the correct offsets. A SwiftLoweredReturnTerminator routine is added at the end to branch back to the epilog.

Out of scope

x64 and UnmanagedCallersOnly are currently not supported and will be added in a follow-up.

Contributes to #120049

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

and w11, w12, #0xFFFF // Extract offset (lower 16 bits)
ldr \reg, [x9, x11] // Load from [x9 + offset]
lsr x12, x12, #16 // Shift to get struct_size
cbz x12, 1f // If struct_size == 0, skip advance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch can be removed - of x12 is zero, we can still add it and the effect would be the same as with the branch.

Btw, what is the case when the struct size is zero?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch controls whether the interpreter stack pointer should advance. For each lowered primitive from the struct, the routine emits after the routine a new slot containing offset (lower 16 bits) | struct size (upper 16 bits). The struct size is non-zero only for the last lowered primitive and makes the stub to advance past the struct on the stack. For all non-last primitives, the struct size is zero, indicating that the interpreter stack pointer should not advance yet because the struct is still being processed.

Let me know if you find this confusing, we can add a new routine for advancing the interpreter stack pointer after the lowered primitives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was not clear in what I meant. My point was the following

cbz x12, 1f
add x9, x9, x12
1:

is equivalent to

add x9, x9, x12

Because if x12 is zero, the x9 ends up being the same after the add, so the conditional branch is not needed.

@kotlarmilos kotlarmilos changed the title [clr-interp] Add Swift calling convention support to CoreCLR interpreter [clr-interp] Add Swift calling convention support to CoreCLR interpreter on arm64 Jan 16, 2026
@kotlarmilos kotlarmilos marked this pull request as ready for review January 16, 2026 10:39
Copilot AI review requested due to automatic review settings January 16, 2026 10:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for Swift calling convention in the CoreCLR interpreter on arm64. Swift uses special registers (x20 for self, x21 for error, x8 for indirect results) and lowers value types into multiple primitive registers. The implementation extends the call stub generator to handle Swift-specific argument and return value marshalling.

Changes:

  • Added Swift-specific register handling (SwiftSelf, SwiftError, SwiftIndirectResult)
  • Implemented value type lowering for arguments and return values
  • Extended CallStubHeader to support post-call routines for lowered returns
  • Added assembly routines for loading/storing Swift arguments and returns
  • Disabled UnmanagedCallersOnly tests for interpreter (out of scope)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/coreclr/vm/callstubgenerator.h Added TargetSlotIndex to CallStubHeader, Swift routine types, and return lowering state
src/coreclr/vm/callstubgenerator.cpp Implemented Swift calling convention detection, type checking, argument/return lowering
src/coreclr/vm/arm64/asmhelpers.asm Added Swift-specific load/store routines, error handling, and lowered return stub
src/coreclr/vm/arm64/asmhelpers.S Unix assembly equivalents of Windows assembly changes
src/tests/Interop/Swift/*.cs Added ActiveIssue attributes to skip UnmanagedCallersOnly tests
Comments suppressed due to low confidence (1)

src/coreclr/vm/callstubgenerator.cpp:1

  • The magic number '10' in the calculation lacks explanation. The comment mentions '(up to 4 elements * 2 slots + terminator = 9 slots)' which equals 9, not 10. Either the comment or the calculation is incorrect. If the +1 is for safety margin or another purpose, it should be documented.
// Licensed to the .NET Foundation under one or more agreements.

blr x11
ldr x11, [fp, #40]
cbz x11, CallJittedMethodRetVoid_NoSwiftError
str x10, [x11]
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the Swift function call, x21 should contain the error value (not x10). The code saves x21 to x10 after the call (line 750), but then stores x10 to the error location. However, the error slot address is in x11 (loaded from [fp, #40]), and we should be storing x21 (not x10) to that location. The assembly at line 2781 shows 'str x10, [x11]' but it should store the error value from x21, not x10 which is a scratch register.

Copilot uses AI. Check for mistakes.
bool isFloat; // True if this element goes in FP register
bool isLowered; // True if this is part of a lowered struct (not a regular arg)
};
SwiftLoweringElement swiftLoweringInfo[128];
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed-size array of 128 elements lacks bounds checking before use. While unlikely to overflow in practice, accessing swiftLoweringInfo[swiftLoweringCount] at lines 3113, 3170, 3184 without verifying swiftLoweringCount < 128 could cause buffer overflow. Consider adding an assertion or check.

Suggested change
SwiftLoweringElement swiftLoweringInfo[128];
struct SwiftLoweringBuffer
{
SwiftLoweringElement elements[128];
SwiftLoweringElement& operator[](int index)
{
_ASSERTE(index >= 0);
_ASSERTE(index < static_cast<int>(ARRAY_SIZE(elements)));
return elements[index];
}
const SwiftLoweringElement& operator[](int index) const
{
_ASSERTE(index >= 0);
_ASSERTE(index < static_cast<int>(ARRAY_SIZE(elements)));
return elements[index];
}
};
SwiftLoweringBuffer swiftLoweringInfo;

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the 128 limit represent? How is it enforced? If it is a natural limit based on ECMA or some other spec, we should make that comment. As such it isn't obvious what stops this from hitting the issue Copilot is calling out.

// IR bytecode address
mov x19, METHODDESC_REGISTER

#ifdef TARGET_APPLE
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from x21 to x22 for saving x0 on Apple platforms is necessary because x21 is now reserved for SwiftError. However, this change should be documented in a comment to explain why x22 is used instead of x21, especially since x21 was the original choice. This would help future maintainers understand the Swift calling convention constraints.

Suggested change
#ifdef TARGET_APPLE
#ifdef TARGET_APPLE
// On Apple, x21 is reserved for SwiftError in the Swift calling convention, so use x22 to save x0.

Copilot uses AI. Check for mistakes.
{
if (!m_interpreterToNative)
{
COMPlusThrow(kNotImplementedException);
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message for unsupported Swift native-to-interpreter calls is not helpful. It should explicitly state that Swift calling convention is only supported for interpreter-to-native calls, not the reverse direction. This would help developers understand the limitation.

Suggested change
COMPlusThrow(kNotImplementedException);
COMPlusThrowNonLocalized(
kNotImplementedException,
W("Swift calling convention is only supported for interpreter-to-native calls; native-to-interpreter Swift calls are not supported."));

Copilot uses AI. Check for mistakes.
PROLOG_SAVE_REG_PAIR fp, lr, #-224!
SAVE_ARGUMENT_REGISTERS sp, 16
SAVE_FLOAT_ARGUMENT_REGISTERS sp, 96
str x8, [sp, #208] ; Save x8 (indirect result register for Swift)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The x8 register is saved and restored in PInvokeImportThunk, but there's no corresponding change in InterpreterStub to preserve x8 across GetInterpThreadContextWithPossiblyMissingThreadOrCallStub. If Swift indirect results are passed through InterpreterStub, x8 could be clobbered. Verify that x8 preservation is complete for all Swift call paths.

Copilot uses AI. Check for mistakes.
The SwiftError argument is a pointer (ELEMENT_TYPE_PTR), so it needs
to be handled differently.

Also fixes the register the asm helpers store as the swift error. The
correct register is x21 while the code was using x10.
PCODE CallStubGenerator::GetSwiftSelfRoutine()
{
#if LOG_COMPUTE_CALL_STUB
printf("GetSwiftSelfRoutine\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use one of our existing logging mechanisms. See the LOG() and LOGGING defines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also remove this if it no longer has value. Unclear if this is difficult to determine at this point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should convert it to the LOG mechanism. When I wrote these logging statements, I had thought they would likely be very temporary, but I've used them 4-5 times since then, so they have some ongoing value. However, I'd like to see that work done in a separate PR. We should convert them to using LOG statements probably with a new LF enum value. Probably what should be done is add a new log facility enum in log.h, call it LF2_INTERPRETER. And convert these log statements to use the LOG2 macro, with info level LL_INFO10000 instead of wrapping them in #if LOG_COMPUTE_CALL_STUB blocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #123272 to ask copilot to do this, let's see what it comes up with.

(PCODE)Load_X0_AtOffset, (PCODE)Load_X1_AtOffset, (PCODE)Load_X2_AtOffset, (PCODE)Load_X3_AtOffset,
(PCODE)Load_X4_AtOffset, (PCODE)Load_X5_AtOffset, (PCODE)Load_X6_AtOffset, (PCODE)Load_X7_AtOffset
};
_ASSERTE(regIndex >= 0 && regIndex < 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ASSERTE(regIndex >= 0 && regIndex < 8);
_ASSERTE(regIndex >= 0 && regIndex < ARRAY_SIZE(routines));

(PCODE)Load_D0_AtOffset, (PCODE)Load_D1_AtOffset, (PCODE)Load_D2_AtOffset, (PCODE)Load_D3_AtOffset,
(PCODE)Load_D4_AtOffset, (PCODE)Load_D5_AtOffset, (PCODE)Load_D6_AtOffset, (PCODE)Load_D7_AtOffset
};
_ASSERTE(regIndex >= 0 && regIndex < 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ASSERTE(regIndex >= 0 && regIndex < 8);
_ASSERTE(regIndex >= 0 && regIndex < ARRAY_SIZE(routines));

static PCODE routines[] = {
(PCODE)Store_X0_AtOffset, (PCODE)Store_X1_AtOffset, (PCODE)Store_X2_AtOffset, (PCODE)Store_X3_AtOffset
};
_ASSERTE(regIndex >= 0 && regIndex < 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ASSERTE(regIndex >= 0 && regIndex < 4);
_ASSERTE(regIndex >= 0 && regIndex < ARRAY_SIZE(routines));

static PCODE routines[] = {
(PCODE)Store_D0_AtOffset, (PCODE)Store_D1_AtOffset, (PCODE)Store_D2_AtOffset, (PCODE)Store_D3_AtOffset
};
_ASSERTE(regIndex >= 0 && regIndex < 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ASSERTE(regIndex >= 0 && regIndex < 4);
_ASSERTE(regIndex >= 0 && regIndex < ARRAY_SIZE(routines));

// true if the given type is SwiftSelf,
// false otherwise.
//
bool isSwiftSelfType(MethodTable* pMT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are method are considered hot, then we might consider using CoreLibBinder::GetClass() and making the types known instead of doing the metadata string look-up and compares.

// The size of the routines array is three times the number of arguments plus one slot for the target method pointer.
return sizeof(CallStubHeader) + ((numArgs + 1) * 3 + 1) * sizeof(PCODE);
// Add extra space for Swift return lowering (up to 4 elements * 2 slots + terminator = 9 slots).
return sizeof(CallStubHeader) + ((numArgs + 1) * 3 + 10) * sizeof(PCODE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return sizeof(CallStubHeader) + ((numArgs + 1) * 3 + 10) * sizeof(PCODE);
return sizeof(CallStubHeader) + ((numArgs + 1) * 3 + 1 + 9) * sizeof(PCODE);

There is a comment that calls out 1 and a comment that calls our 9, I typically like to see those numbers reflected in the code.

PROLOG_SAVE_REG_PAIR fp, lr, #-224!
SAVE_ARGUMENT_REGISTERS sp, 16
SAVE_FLOAT_ARGUMENT_REGISTERS sp, 96
str x8, [sp, #208] ; Save x8 (indirect result register for Swift)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is Windows only, we should not have any swift changes here.

}

#if defined(TARGET_APPLE) && defined(TARGET_ARM64)
// Swift lowering info for expanded struct elements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to put this new stuff into a separate method instead of growing this already large method further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants