Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 29, 2026

Description

R2R virtual dispatch on iOS-like platforms was falling back to ExternalMethodFixupWorker on every call instead of patching callsites, causing 200x slowdown in interface-heavy code. Profile showed 18% of time in signature parsing alone.

Changes

  • Changed --enable-cached-interface-dispatch-support from bool to bool? to support platform-conditional defaults
  • Defaults to true when TargetAllowsRuntimeCodeGeneration is false (iOS, iOSSimulator, tvOS, tvOSSimulator, MacCatalyst, WASM)
  • Defaults to false on JIT-capable platforms (existing behavior)
  • Updated help text to document conditional default

Implementation

Crossgen2RootCommand.cs:

public Option<bool?> EnableCachedInterfaceDispatchSupport { get; } =
    new("--enable-cached-interface-dispatch-support", "--CID") { ... };

Program.cs:

nodeFactoryFlags.EnableCachedInterfaceDispatchSupport = 
    Get(_command.EnableCachedInterfaceDispatchSupport) ?? 
    !typeSystemContext.TargetAllowsRuntimeCodeGeneration;

User-specified values override the default on all platforms.

Original prompt

This section details on the original issue you should resolve

<issue_title>ValueTupleCompareCached microbenchmark slow in ios-like configuration</issue_title>
<issue_description>This benchmark is around 200x slower when running in ios-like configuration (composite r2r + interp fallback) on desktop, compared to default (jit).

Interestingly enough, after profiling, it turns out that most of the time spent is not in interpreting methods but rather in R2R machinery. In order to reproduce, the application was compiled with composite-r2r. The minimal runtime configuration diff that reproduces this issue is BrzVlad@3a4d118. At runtime, tiering has to be disabled as well (DOTNET_TieredCompilation=0).

Most of the time spent originates in ExternalMethodFixupWorker, which is called for every benchmark iteration, suggesting that a callsite from r2r is not properly patched. The runtime seems to attempt to resolve the call for each microbenchmark iteration, with excessive time spent related to signature decoding. Attached a profile showing the amount of time spent in the hottest methods.

Image

Even when running with composite-r2r with jit fallback, the benchmark was still 50x slower than normal jit, with the above runtime configuration. Microbenchmark extracted from https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/runtime/Devirtualization/DefaultEqualityComparerPerf.cs#L79

using System;
using System.Runtime.CompilerServices;
using System.Collections.Generic;
using System.Globalization;

public class EqualityComparerFixture<T> where T : IEquatable<T>
{
	IEqualityComparer<T> comparer;

	public EqualityComparerFixture()
	{
		comparer = EqualityComparer<T>.Default;
	}

	[MethodImpl(MethodImplOptions.NoInlining)]
	public bool CompareCached(ref T a, ref T b)
	{
		return comparer.Equals(a, b);
	}
}

public class Program {

	public enum E {
		RED = 1,
		BLUE = 2
	}

	private EqualityComparerFixture<ValueTuple<byte, E, int>> valueTupleFixture = new EqualityComparerFixture<ValueTuple<byte, E, int>>();
	private ValueTuple<byte, E, int> v0 = new ValueTuple<byte, E, int>(3, E.RED, 11);

	public bool ValueTupleCompareCached() => valueTupleFixture.CompareCached(ref v0, ref v0);

	public void BenchCompareTuple()
	{
		for (int j = 0; j < 100000000; j++)
			ValueTupleCompareCached();
		Console.WriteLine ("Benchmarked compare tuple");
	}

	public static void Main(string[] args)
	{
		Program p = new Program();
		p.BenchCompareTuple();
	}
}
```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
<comment_new><author>@BrzVlad</author><body>
cc @jkotas @AndyAyersMS @davidwrighton  for any ideas. Not sure who would be the best person to look further into this.</body></comment_new>
<comment_new><author>@davidwrighton</author><body>
This is probably mine to investigate</body></comment_new>
<comment_new><author>@davidwrighton</author><body>
@BrzVlad  Ah, this is an issue that I had forgotten about, but already have a fix for. R2R dispatch to virtual methods using CachedInterfaceDispatch technically works without a special switch to the compiler, but it has to use a very expensive fallback. R2R compilation that is targeted at running in an environment which uses cached interface dispatch really needs to pass the "--enable-cached-interface-dispatch-support" command line argument. I believe we already pass a good --TargetOS switch to the compiler, so we could enable the cached interface dispatch support unconditionally for ios, iossimulator, tvos, tvossimulator, maccatalyst, and wasm.</body></comment_new>
<comment_new><author>@davidwrighton</author><body>
@copilot when implementing this, please use the TargetAllowsRuntimeCodeGeneration property to control the default value of the EnableCachedInterfaceDispatchSupport property set on the nodeFactoryFlags. This change should be in src/coreclr/tools/aot/crossgen2/Program.cs. In addition, please update the help text for the EnableCachedInterfaceDispatchSupport flag in src/coreclr/tools/aot/crossgen2/Properties/Resources.resx to describe that the default is true on platforms which do not support runtime code generation.</body></comment_new>
</comments>


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: davidwrighton <10779849+davidwrighton@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix ValueTupleCompareCached benchmark performance in iOS-like config Enable cached interface dispatch by default on non-JIT platforms Jan 29, 2026
Copilot AI requested a review from davidwrighton January 29, 2026 22:17
@davidwrighton davidwrighton requested a review from BrzVlad January 30, 2026 00:33
@davidwrighton davidwrighton marked this pull request as ready for review January 30, 2026 00:33
Copilot AI review requested due to automatic review settings January 30, 2026 00:33
@davidwrighton
Copy link
Member

@BrzVlad, this will require your scripts that call crossgen2 to use the right targetos switch. I believe @jkoritzinsky has made the behavior in the SDK work, but I don't know exactly what you're testing on now.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a significant performance issue (200x slowdown) on iOS-like platforms where R2R virtual dispatch was falling back to expensive signature parsing on every call. The fix enables cached interface dispatch by default on platforms that don't support runtime code generation (JIT).

Changes:

  • Changed the --enable-cached-interface-dispatch-support option from bool to bool? to support platform-conditional defaults
  • Implemented default logic that enables cached interface dispatch on non-JIT platforms (iOS, iOSSimulator, tvOS, tvOSSimulator, MacCatalyst, WASM/WASI) while maintaining the existing false default on JIT-capable platforms
  • Updated help text to document the conditional default behavior

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
Crossgen2RootCommand.cs Changed EnableCachedInterfaceDispatchSupport option type from bool to bool? to allow tristate values (true/false/unspecified)
Program.cs Added conditional default logic using null-coalescing operator to enable cached interface dispatch when not explicitly set and platform doesn't support JIT
Resources.resx Updated help text to document that the default is true on platforms which do not support runtime code generation

@jkoritzinsky
Copy link
Member

@BrzVlad, this will require your scripts that call crossgen2 to use the right targetos switch. I believe @jkoritzinsky has made the behavior in the SDK work, but I don't know exactly what you're testing on now.

Yes the SDK should be hooked up to pass the right target OS and arch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueTupleCompareCached microbenchmark slow in ios-like configuration

3 participants