chore(deps-dev): Bump torchao from 0.11 to 0.15.0 by dependabot[bot] · Pull Request #193 · foundation-model-stack/fms-model-optimizer

dependabot · 2025-12-19T21:10:20Z

Bumps torchao from 0.11 to 0.15.0.

Release notes

v0.14.1

Highlights

We are excited to announce the 0.14.1 release of torchao! This release adds support for MoE training on Backwell GPUs and NVFP4 QAT!

(Prototype) MoE training on Blackwell GPUs

We’ve added a quantized building block for speeding up MoE training on Blackwell GPUs: torchao’s `_scaled_grouped_mm`! It is a differentiable drop-in replacement for `torch._grouped_mm` that dynamically quantizes inputs using the given recipe, performs a scaled grouped GEMM, then returns the results in original precision. This results in significant speedups (see benchmarks below)!
import torch
from torch.nn import functional as F
from torchao.prototype.moe_training import (
    _scaled_grouped_mm as torchao_scaled_grouped_mm
)
from torchao.prototype.moe_training.conversion_utils import MoEScalingType
from torchao.prototype.moe_training.utils import generate_jagged_offs
num_groups, total_M, N, K = 8, 131072, 8192, 5120
A = input actvations, B = expert weights
A = torch.randn(total_M, K, dtype=torch.bfloat16, device="cuda", requires_grad=True)
B = torch.randn(num_groups, N, K, dtype=torch.bfloat16, device="cuda", requires_grad=True)
Token group offsets computed by router in actual MoE layer
offs = generate_jagged_offs(num_groups, total_M, device="cuda")
Forward and backward example
out = torchao_scaled_grouped_mm(
A,
B.transpose(-2, -1),
offs=offs,
scaling_type=MoEScalingType.MXFP8,
)
labels = torch.ones_like(out)
loss = F.mse_loss(out, labels)
loss.backward()
Microbenchmarks (see README for commands to reproduce benchmarks):

Forward + backward pass vs torch._grouped_mm:

~1.4-1.8x faster for Llama4 17bx16e shapes

~1.2-1.4x faster for DeepSeekV3 671b shapes

Full MoE layer forward + backward pass:

~1.4x faster (Llama4 17bx16e shapes, batch_size=8, seq_len=16384)

~1.2x faster (DeepSeekV3 671b shapes, batch_size=8, seq_len=16384).

It’s also already integrated into TorchTitan for E2E training with DeepSeekV3 and Llama4! Just use the command line flag: `--model.converters=”quantize.grouped_mm.mx”, which will convert all `torch._grouped_mm` ops to torchao _scaled_grouped_mm ops under the hood:

... (truncated)

Commits

9338966 use python version agnostic binding for mxfp8 cuda kernels (#3471)
acc9103 Fix NVFP4 QAT backward typo (#3478)
286c2d8 Fix NVFP4 QAT convert path (#3450)
924d6c0 update version compatibility table (#3455)
aa21b80 skip certain mxfp8 tests for cuda < 12.8 (#3443)
69ce0fd [Intel GPU] Enable optim SR test (#3055)
70e903b [xpu][test] Port 2 test/quantization/pt2e/test_{quantize_pt2e, quantize_pt2e_...
1272f3c [xpu][test] Port 2 test/dtypes_{floatx, bitpacking} UT files to intel XPU (#3...
c4273fe Int8Tensor migration cleanup (#3407)
7e0d439 [CPU] Reland qconv fp8 fusion passes (#3433)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [torchao](https://github.com/pytorch/ao) from 0.11 to 0.15.0. - [Release notes](https://github.com/pytorch/ao/releases) - [Commits](pytorch/ao@v0.11.0...v0.15.0) --- updated-dependencies: - dependency-name: torchao dependency-version: 0.15.0 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-02-11T21:54:16Z

Superseded by #197.

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Dec 19, 2025

dependabot bot requested review from BrandonGroth, andrea-fasoli, chichun-charlie-liu and kcirred as code owners December 19, 2025 21:10

dependabot bot added the dependencies Pull requests that update a dependency file label Dec 19, 2025

dependabot bot requested review from nwang-ibm and tharapalanivel as code owners December 19, 2025 21:10

dependabot bot added the python Pull requests that update python code label Dec 19, 2025

dependabot bot mentioned this pull request Dec 19, 2025

chore(deps): Bump torchao from 0.11 to 0.14.1 #189

Closed

dependabot bot closed this Feb 11, 2026

dependabot bot deleted the dependabot/pip/torchao-0.15.0 branch February 11, 2026 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps-dev): Bump torchao from 0.11 to 0.15.0#193

chore(deps-dev): Bump torchao from 0.11 to 0.15.0#193
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/torchao-0.15.0

dependabot bot commented on behalf of github Dec 19, 2025

Uh oh!

dependabot bot commented on behalf of github Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot bot commented on behalf of github Dec 19, 2025

v0.14.1

Highlights

(Prototype) MoE training on Blackwell GPUs

Uh oh!

dependabot bot commented on behalf of github Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants