Commit 01de02e
authored
[gguf][torch.compile time] Convert to plain tensor earlier in dequantize_gguf_tensor (#13166)
[gguf] Convert to plain tensor earlier in dequantize_gguf_tensor
Once dequantize_gguf_tensor fetches the quant_type attributed from the
GGUFParamter tensor subclass, there is no further need of running the
actual dequantize operations on the Tensor subclass, we can just convert
to plain tensor right away.
This not only makes PyTorch eager faster, but reduces torch.compile
tracer compile time from 36 seconds to 10 seconds, because there is lot
less code to trace now.1 parent db2d7e7 commit 01de02e
1 file changed
+4
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
516 | 516 | | |
517 | 517 | | |
518 | 518 | | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
519 | 522 | | |
520 | 523 | | |
521 | 524 | | |
| |||
525 | 528 | | |
526 | 529 | | |
527 | 530 | | |
528 | | - | |
| 531 | + | |
529 | 532 | | |
530 | 533 | | |
531 | 534 | | |
| |||
0 commit comments