Commit eb8e792
[PyTorch][NVFP4][MOE] NVFP4 Grouped Quantize with Hadamard Transform (#2411)
* rowwise colwise RHT group quant v1
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* remove local array RW
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* change wait_barrier
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* fast math options
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* use mult to replace div
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* format
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* bulk move random states
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* greptile
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* lint
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* revert to use divides
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* avoid fp32 bf16 round-trip in RHT cast fusion
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* trigger fastmath by toggle NVTE_RHT_CAST_FUSION_USE_FAST_MATH
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* integrate row col rht fusion, functional
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* numerics aligned
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* style
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* remove device sync
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* 128 padding
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* revert colwise rng state creation because of row-col fused kernel
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* fix CI, linter
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* refactor RS for generating two random values
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* Avoid invalid configs with templated kernel
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* fix acc pipeline init with 0 arrival count
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* restore rowwise-only mode
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* switch to dynamic atomic scheduler
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* Avoid instantiating group RHT+cast kernel without row-wise or col-wise output
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Include fast math option in quantization config
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Fix linter warnings and review nits
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Use TE license
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Fix bug where kernel is always launched on stream
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Restore BF16 intermediate downcast in fused RHT-cast kernels
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* fix numerical test of grouped kernel
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
* Make sure row-wise and col-wise quantization use different RNG seeds
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
* Restore autoformatter
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Zhongbo Zhu <zhongboz@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>1 parent 47902e9 commit eb8e792
File tree
19 files changed
+4205
-232
lines changed- benchmarks/linear
- tests/pytorch/nvfp4
- transformer_engine
- common
- cast
- dispatch
- nvfp4
- hadamard_transform
- include/transformer_engine
- pytorch
- csrc
- extensions
19 files changed
+4205
-232
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| |||
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
176 | | - | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
177 | 179 | | |
178 | 180 | | |
179 | 181 | | |
| |||
182 | 184 | | |
183 | 185 | | |
184 | 186 | | |
185 | | - | |
186 | | - | |
| 187 | + | |
187 | 188 | | |
188 | 189 | | |
189 | 190 | | |
190 | 191 | | |
191 | 192 | | |
192 | 193 | | |
| 194 | + | |
193 | 195 | | |
194 | 196 | | |
195 | 197 | | |
196 | 198 | | |
197 | 199 | | |
198 | 200 | | |
199 | 201 | | |
200 | | - | |
| 202 | + | |
201 | 203 | | |
202 | 204 | | |
203 | 205 | | |
| |||
213 | 215 | | |
214 | 216 | | |
215 | 217 | | |
| 218 | + | |
| 219 | + | |
216 | 220 | | |
217 | 221 | | |
218 | 222 | | |
| |||
221 | 225 | | |
222 | 226 | | |
223 | 227 | | |
224 | | - | |
| 228 | + | |
225 | 229 | | |
226 | 230 | | |
227 | 231 | | |
| |||
234 | 238 | | |
235 | 239 | | |
236 | 240 | | |
237 | | - | |
| 241 | + | |
238 | 242 | | |
239 | 243 | | |
240 | 244 | | |
| |||
266 | 270 | | |
267 | 271 | | |
268 | 272 | | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
269 | 279 | | |
270 | 280 | | |
271 | 281 | | |
| |||
297 | 307 | | |
298 | 308 | | |
299 | 309 | | |
300 | | - | |
| 310 | + | |
301 | 311 | | |
302 | 312 | | |
303 | 313 | | |
| |||
371 | 381 | | |
372 | 382 | | |
373 | 383 | | |
374 | | - | |
| 384 | + | |
| 385 | + | |
375 | 386 | | |
376 | 387 | | |
377 | 388 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
198 | 198 | | |
199 | 199 | | |
200 | 200 | | |
201 | | - | |
| 201 | + | |
202 | 202 | | |
203 | 203 | | |
204 | 204 | | |
| |||
221 | 221 | | |
222 | 222 | | |
223 | 223 | | |
224 | | - | |
| 224 | + | |
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
| |||
247 | 247 | | |
248 | 248 | | |
249 | 249 | | |
| 250 | + | |
250 | 251 | | |
251 | 252 | | |
252 | 253 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
174 | 174 | | |
175 | 175 | | |
176 | 176 | | |
| 177 | + | |
| 178 | + | |
177 | 179 | | |
178 | 180 | | |
179 | 181 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
320 | 321 | | |
321 | 322 | | |
322 | 323 | | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
323 | 388 | | |
324 | 389 | | |
325 | 390 | | |
| |||
0 commit comments