Skip to content

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Dec 19, 2025

Bumps torchao from 0.11 to 0.15.0.

Release notes

Sourced from torchao's releases.

v0.14.1

Highlights

We are excited to announce the 0.14.1 release of torchao! This release adds support for MoE training on Backwell GPUs and NVFP4 QAT!

(Prototype) MoE training on Blackwell GPUs

We’ve added a quantized building block for speeding up MoE training on Blackwell GPUs: torchao’s `_scaled_grouped_mm`! It is a differentiable drop-in replacement for `torch._grouped_mm` that dynamically quantizes inputs using the given recipe, performs a scaled grouped GEMM, then returns the results in original precision. This results in significant speedups (see benchmarks below)!

import torch
from torch.nn import functional as F
from torchao.prototype.moe_training import (
    _scaled_grouped_mm as torchao_scaled_grouped_mm
)
from torchao.prototype.moe_training.conversion_utils import MoEScalingType
from torchao.prototype.moe_training.utils import generate_jagged_offs
num_groups, total_M, N, K = 8, 131072, 8192, 5120
A = input actvations, B = expert weights
A = torch.randn(total_M, K, dtype=torch.bfloat16, device="cuda", requires_grad=True)
B = torch.randn(num_groups, N, K, dtype=torch.bfloat16, device="cuda", requires_grad=True)
Token group offsets computed by router in actual MoE layer
offs = generate_jagged_offs(num_groups, total_M, device="cuda")
Forward and backward example
out = torchao_scaled_grouped_mm(
A,
B.transpose(-2, -1),
offs=offs,
scaling_type=MoEScalingType.MXFP8,
)
labels = torch.ones_like(out)
loss = F.mse_loss(out, labels)
loss.backward()

Microbenchmarks (see README for commands to reproduce benchmarks):

  • Forward + backward pass vs torch._grouped_mm:
    • ~1.4-1.8x faster for Llama4 17bx16e shapes
    • ~1.2-1.4x faster for DeepSeekV3 671b shapes
  • Full MoE layer forward + backward pass:
    • ~1.4x faster (Llama4 17bx16e shapes, batch_size=8, seq_len=16384)
    • ~1.2x faster (DeepSeekV3 671b shapes, batch_size=8, seq_len=16384).

It’s also already integrated into TorchTitan for E2E training with DeepSeekV3 and Llama4! Just use the command line flag: `--model.converters=”quantize.grouped_mm.mx”, which will convert all `torch._grouped_mm` ops to torchao _scaled_grouped_mm ops under the hood:

... (truncated)

Commits
  • 9338966 use python version agnostic binding for mxfp8 cuda kernels (#3471)
  • acc9103 Fix NVFP4 QAT backward typo (#3478)
  • 286c2d8 Fix NVFP4 QAT convert path (#3450)
  • 924d6c0 update version compatibility table (#3455)
  • aa21b80 skip certain mxfp8 tests for cuda < 12.8 (#3443)
  • 69ce0fd [Intel GPU] Enable optim SR test (#3055)
  • 70e903b [xpu][test] Port 2 test/quantization/pt2e/test_{quantize_pt2e, quantize_pt2e_...
  • 1272f3c [xpu][test] Port 2 test/dtypes_{floatx, bitpacking} UT files to intel XPU (#3...
  • c4273fe Int8Tensor migration cleanup (#3407)
  • 7e0d439 [CPU] Reland qconv fp8 fusion passes (#3433)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [torchao](https://github.com/pytorch/ao) from 0.11 to 0.15.0.
- [Release notes](https://github.com/pytorch/ao/releases)
- [Commits](pytorch/ao@v0.11.0...v0.15.0)

---
updated-dependencies:
- dependency-name: torchao
  dependency-version: 0.15.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Dec 19, 2025
@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Dec 19, 2025
@dependabot dependabot bot added the python Pull requests that update python code label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant