[PyTorch] Python `GroupedTensor` by ksivaman · Pull Request #2654 · NVIDIA/TransformerEngine

ksivaman · 2026-02-06T06:14:29Z

Description

Extracts the python pieces of GroupedTensor infrastructure from #2600. Since this is mainly focused on creation of weights as a single GroupedTensor and exposing them as multiple QuantizedTensors for PyTorch, this portion does not need to be graph capturable.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Expose a python GroupedTensor class.
Integrate GroupedTensor into GroupedLinear such that the parameters are contiguous.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

greptile-apps · 2026-02-06T06:18:29Z

Greptile Overview

Greptile Summary

This PR extracts Python GroupedTensor infrastructure from #2600, enabling PyTorch to store multiple weight tensors with contiguous memory layout.

Key Changes

New GroupedTensor class (942 lines): Manages collections of tensors with different shapes but same dtype/scaling. Stores all data in contiguous 1D buffers with logical 2D shape representation. Supports all quantization recipes (FP8 delayed/current scaling, MXFP8, NVFP4, Float8 block scaling).
GroupedLinear integration: Added make_grouped_weights() method that converts individual weight parameters into a single GroupedTensor with contiguous storage, then re-registers them as parameters. This creates memory-efficient grouped storage while maintaining the same parameter interface.
Recipe API refactor: Changed Recipe class methods from instance methods (isinstance(self, Recipe)) to class methods (issubclass(cls, Recipe)). Required because _get_compatible_recipe() returns recipe classes, not instances.
Comprehensive tests: Added 441-line test suite covering construction, quantization, splitting, and varying tensor shapes. Enhanced sanity tests with contiguous memory verification.

Architecture

The GroupedTensor acts as unified storage that exposes multiple QuantizedTensor views. Each view shares the underlying contiguous buffers but presents individual tensor semantics. This enables efficient grouped GEMM operations while maintaining PyTorch's parameter interface.

Minor Issue

Type hint inconsistency on line 344 of grouped_tensor.py: uses torch.tensor instead of torch.Tensor.

Confidence Score: 4.5/5

This PR is safe to merge with only a minor type hint fix needed
The implementation is well-architected with comprehensive test coverage. The GroupedTensor design properly handles memory management and supports all quantization formats. The GroupedLinear integration is clean and maintains backward compatibility. Only issue found is a minor type hint inconsistency (torch.tensor vs torch.Tensor) that doesn't affect runtime behavior.
transformer_engine/pytorch/tensor/storage/grouped_tensor.py needs type hint fix on line 344

Important Files Changed

Filename	Overview
transformer_engine/pytorch/tensor/storage/grouped_tensor.py	New 942-line GroupedTensor class for storing multiple tensors with shared contiguous storage. Supports all quantization recipes (FP8, MXFP8, NVFP4). Minor type hint issue on line 344.
transformer_engine/pytorch/module/grouped_linear.py	Integrated GroupedTensor into GroupedLinear by adding make_grouped_weights() method. Creates contiguous parameter storage during reset_parameters(). Clean implementation with proper parameter re-registration.
tests/pytorch/test_grouped_tensor.py	New comprehensive test file with 441 lines covering GroupedTensor construction, quantization, splitting, and varying shapes across all recipes. Excellent coverage.
tests/pytorch/test_sanity.py	Added check_grouped_tensor_pointers() helper to verify contiguous memory layout in GroupedLinear tests. Validates that GroupedTensor integration works correctly.
transformer_engine/common/recipe/init.py	Changed Recipe methods from instance to class methods (isinstance to issubclass). Necessary for GroupedTensor to check recipe types using _get_compatible_recipe() which returns classes.

Sequence Diagram

sequenceDiagram
    participant User
    participant GroupedLinear
    participant GroupedTensor
    participant Quantizer
    participant QuantizedTensor
    
    Note over User,QuantizedTensor: Initialization Phase
    User->>GroupedLinear: __init__(num_gemms, in_features, out_features)
    GroupedLinear->>GroupedLinear: Create individual weight parameters
    GroupedLinear->>GroupedLinear: reset_parameters()
    GroupedLinear->>GroupedLinear: make_grouped_weights()
    
    Note over GroupedLinear,GroupedTensor: GroupedTensor Creation
    GroupedLinear->>GroupedTensor: make_grouped_tensor_with_shapes(num_tensors, shape, quantizer)
    GroupedTensor->>GroupedTensor: Calculate logical_shape and offsets
    GroupedTensor->>GroupedTensor: Allocate contiguous buffers (data, scale_inv, etc.)
    GroupedTensor->>GroupedTensor: split_into_quantized_tensors()
    
    loop For each tensor
        GroupedTensor->>QuantizedTensor: Create view of contiguous storage
        QuantizedTensor-->>GroupedTensor: Return quantized tensor view
    end
    
    GroupedTensor-->>GroupedLinear: Return GroupedTensor with quantized_tensors
    
    loop For each weight
        GroupedLinear->>QuantizedTensor: copy_(original_weight)
        GroupedLinear->>GroupedLinear: register_parameter(weight_i, quantized_tensor)
    end
    
    Note over User,QuantizedTensor: Forward Pass
    User->>GroupedLinear: forward(inp, m_splits)
    GroupedLinear->>GroupedLinear: _get_weight_tensors()
    GroupedLinear->>GroupedLinear: Prepare input quantization
    GroupedLinear->>GroupedLinear: general_grouped_gemm(weights, inputs, outputs)
    Note over GroupedLinear: All weights share contiguous storage
    GroupedLinear-->>User: Return output

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-06T06:18:33Z