Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 29, 2026

Description

Per feedback on PR #123610, adds vectorized Half support to TensorPrimitives.BitIncrement and TensorPrimitives.BitDecrement by processing Halfs as ushorts for the whole computation, avoiding the overhead of widening to float and narrowing back.

Changes

TensorPrimitives.BitIncrement.cs:

  • Added check for Half type at method entry using TryUnaryBitwiseInvokeHalfAsInt16<T, HalfBitIncrementOperator>
  • Created HalfBitIncrementOperator struct implementing IUnaryOperator<short, short> with:
    • Direct ushort bit manipulation (no float conversion)
    • SIMD implementations for Vector128/256/512
    • Proper handling of special cases: NaN (both positive and negative preserved), +Infinity (preserved), -Infinity → MinValue, -0 → Epsilon

TensorPrimitives.BitDecrement.cs:

  • Added check for Half type at method entry using TryUnaryBitwiseInvokeHalfAsInt16<T, HalfBitDecrementOperator>
  • Created HalfBitDecrementOperator struct implementing IUnaryOperator<short, short> with:
    • Direct ushort bit manipulation (no float conversion)
    • SIMD implementations for Vector128/256/512
    • Proper handling of special cases: NaN (both positive and negative preserved), -Infinity (preserved), +Infinity → MaxValue, +0 → -Epsilon

Key Optimization

This approach uses TryUnaryBitwiseInvokeHalfAsInt16 which operates directly on the short/ushort representation of Half values without any conversion to/from float. This avoids the overhead of:

  • Widening Half to float
  • Performing the operation on float
  • Narrowing float back to Half

Performance Benchmark

To trigger EgorBot benchmark comparison:

@EgorBot -x64 -arm

```cs
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Numerics.Tensors;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    private Half[] _source = default!;
    private Half[] _destination = default!;

    [Params(1000)]
    public int Count { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _source = Enumerable.Range(0, Count)
            .Select(i => (Half)(i % 2 == 0 ? i * 0.1f : -i * 0.1f))
            .ToArray();
        _destination = new Half[Count];
    }

    [Benchmark]
    public void BitIncrement()
    {
        TensorPrimitives.BitIncrement<Half>(_source, _destination);
    }

    [Benchmark]
    public void BitDecrement()
    {
        TensorPrimitives.BitDecrement<Half>(_source, _destination);
    }
}
Original prompt

See the comment at #123610 (comment). Implement such an optimization for Half in BitIncrement and BitDecrement, where the Halfs are processed as ushorts for the whole computation rather than trying to convert to floats. Then use the performance benchmarking skill to evaluate the benefits of the change.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize Half processing in BitIncrement and BitDecrement Optimize Half.BitIncrement and BitDecrement to use ushort operations Jan 29, 2026
Copilot AI requested a review from stephentoub January 29, 2026 15:21
@stephentoub
Copy link
Member

@copilot, no, this wasn't about Half.BitIncrement/Decrement. It was about vectorizing TensorPrimitives.BitIncrement/BitDecrement when the T is Half. Please start over.

…ment using ushort operations

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 29, 2026

@copilot, no, this wasn't about Half.BitIncrement/Decrement. It was about vectorizing TensorPrimitives.BitIncrement/BitDecrement when the T is Half. Please start over.

You're right, I misunderstood. I've corrected this in commit 2ac7925. The changes now add vectorized Half support to TensorPrimitives.BitIncrement/BitDecrement by:

  1. Added check for Half type using TryUnaryBitwiseInvokeHalfAsInt16 at the method entry point
  2. Created HalfBitIncrementOperator and HalfBitDecrementOperator that implement IUnaryOperator<short, short> with direct ushort bit manipulation (no float conversion)
  3. Includes SIMD implementations for Vector128/256/512 with proper special case handling

The Half.cs file has been reverted to its original state.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title Optimize Half.BitIncrement and BitDecrement to use ushort operations Vectorize TensorPrimitives.BitIncrement/BitDecrement for Half using ushort operations Jan 29, 2026
@stephentoub
Copy link
Member

@EgorBot -x64 -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Numerics.Tensors;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    private Half[] _source = default!;
    private Half[] _destination = default!;

    [Params(1000)]
    public int Count { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _source = Enumerable.Range(0, Count)
            .Select(i => (Half)(i % 2 == 0 ? i * 0.1f : -i * 0.1f))
            .ToArray();
        _destination = new Half[Count];
    }

    [Benchmark]
    public void BitIncrement()
    {
        TensorPrimitives.BitIncrement<Half>(_source, _destination);
    }

    [Benchmark]
    public void BitDecrement()
    {
        TensorPrimitives.BitDecrement<Half>(_source, _destination);
    }
}

@stephentoub stephentoub marked this pull request as ready for review January 29, 2026 18:57
Copilot AI review requested due to automatic review settings January 29, 2026 18:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes TensorPrimitives.BitIncrement and TensorPrimitives.BitDecrement for Half types by implementing vectorized operations that work directly on the ushort bit representation, avoiding costly float conversions.

Changes:

  • Added vectorized Half support for BitIncrement and BitDecrement using direct ushort bit manipulation
  • Implemented SIMD operations for Vector128, Vector256, and Vector512 that properly handle all IEEE 754 special cases
  • Uses TryUnaryBitwiseInvokeHalfAsInt16 to process Half values as short/ushort throughout the computation

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
TensorPrimitives.BitIncrement.cs Added Half-specific vectorized operator that manipulates ushort bits directly, with proper handling of NaN, infinities, and zero edge cases
TensorPrimitives.BitDecrement.cs Added Half-specific vectorized operator that manipulates ushort bits directly, with proper handling of NaN, infinities, and zero edge cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants