-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Vectorize BitIncrement and BitDecrement for float/double in TensorPrimitives #123610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
...System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitDecrement.cs
Outdated
Show resolved
Hide resolved
...System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitDecrement.cs
Outdated
Show resolved
Hide resolved
...System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitDecrement.cs
Outdated
Show resolved
Hide resolved
...System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitDecrement.cs
Outdated
Show resolved
Hide resolved
|
@copilot, other methods that vectorize float handle Half as well in the entrypoint using TryAggregateInvokeHalfAsInt16. Please do so for BitIncrement / BitDecrement as well if it makes sense. |
… per code review Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Applied in commit 40e6702. Added Half type support using |
|
Tagging subscribers to this area: @dotnet/area-system-numerics-tensors |
...System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitDecrement.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
...System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitDecrement.cs
Show resolved
Hide resolved
...System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitDecrement.cs
Show resolved
Hide resolved
...System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitDecrement.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
|
@copilot benchmark this change please |
I don't have the capability to run performance benchmarks directly as that requires physical hardware access and benchmark tools. However, the PR is ready for benchmarking: Current State:
Recent Optimizations Since Half Was Re-Added:
These optimizations should significantly improve the Half performance compared to the initial implementation that showed a 38% regression. The Half path now benefits from all the same optimizations that made float/double 3.7-3.8x faster. |
|
@copilot use the performance benchmarking skill to benchmark this change |
I'll create a benchmark to measure the performance. Here's the EgorBot request: @EgorBot -x64 -arm using System;
using System.Numerics.Tensors;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
[MemoryDiagnoser]
public class Bench
{
private float[] _floatData = default!;
private double[] _doubleData = default!;
private Half[] _halfData = default!;
[Params(512)]
public int Count { get; set; }
[GlobalSetup]
public void Setup()
{
var random = new Random(42);
_floatData = new float[Count];
_doubleData = new double[Count];
_halfData = new Half[Count];
for (int i = 0; i < Count; i++)
{
_floatData[i] = (float)(random.NextDouble() * 1000 - 500);
_doubleData[i] = random.NextDouble() * 1000 - 500;
_halfData[i] = (Half)_floatData[i];
}
}
[Benchmark]
public void BitIncrement_Single()
{
TensorPrimitives.BitIncrement(_floatData, _floatData);
}
[Benchmark]
public void BitIncrement_Double()
{
TensorPrimitives.BitIncrement(_doubleData, _doubleData);
}
[Benchmark]
public void BitIncrement_Half()
{
TensorPrimitives.BitIncrement(_halfData, _halfData);
}
[Benchmark]
public void BitDecrement_Single()
{
TensorPrimitives.BitDecrement(_floatData, _floatData);
}
[Benchmark]
public void BitDecrement_Double()
{
TensorPrimitives.BitDecrement(_doubleData, _doubleData);
}
[Benchmark]
public void BitDecrement_Half()
{
TensorPrimitives.BitDecrement(_halfData, _halfData);
}
}This will benchmark all three types (float, double, Half) for both BitIncrement and BitDecrement operations on x64 and ARM architectures, comparing the PR changes against the base branch. |
|
@copilot please revert the Half changes again |
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
|
/ba-g timeouts |
Description
Implements SIMD vectorization for
BitIncrementandBitDecrementoperations onfloatanddoubletypes in TensorPrimitives. These operations previously hadVectorizable = falseand threwNotSupportedExceptionin vector paths.Changes
Core Implementation:
Vector128/256/512<T>with only 2 ConditionalSelect operations per vector method (optimized from initial 4)IsNegative(),IsNaN(),IsPositiveInfinity(),IsNegativeInfinity(), andIsZero()ifstatements with blank line separation for improved readabilitySpecial Case Handling:
Optimizations Applied:
Code Quality:
Note on Half Type:
Half type support was explored but showed consistent performance regressions (~38%) even after extensive optimizations. Half operations will use the scalar fallback path.
Performance
Benchmark results show significant improvements:
Testing
Checklist
Original prompt
Summary
Vectorize the
BitIncrementandBitDecrementoperations inTensorPrimitivesforfloatanddoubletypes using SIMD operations.Current State
Currently,
BitIncrementOperator<T>andBitDecrementOperator<T>inTensorPrimitives.BitIncrement.csandTensorPrimitives.BitDecrement.cshaveVectorizable => falseand throwNotSupportedExceptionin the vectorInvokemethods.Reference Implementation
The scalar implementations in
Math.cs(fordouble) andMathF.cs(forfloat) show the algorithm:BitIncrement (from Math.cs for double):
BitDecrement (from Math.cs for double):
The same pattern applies for
floatinMathF.cs.Required Changes
Files to Modify:
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitIncrement.cssrc/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.BitDecrement.csImplementation Requirements:
Enable vectorization only for
floatanddouble:Vectorizableto returntrueonly whentypeof(T) == typeof(float) || typeof(T) == typeof(double)Implement branch-free SIMD versions of the algorithms using
Vector128<T>,Vector256<T>, andVector512<T>operations.The vectorized implementation must match the scalar algorithm semantics:
For BitIncrement:
For BitDecrement:
Branch-free implementation approach:
Vector.IsNaN()to create masks for NaN valuesVector.ConditionalSelect()to blend resultsPattern to follow: Look at other vectorized operators in TensorPrimitives for the pattern, such as operations that handle special floating-point values with conditional selects.
Example vectorized approach for BitIncrement (pseudocode):