You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ingest FP8 attn scales and use them in ROCm FlashAttention (ROCm#338)
* Ingest FP8 attn scales and use them in Triton FA, if present
* Disabling calc_kv_scales if the checkoint has them. Enabling fp8 attention for dynamic quantization
* q_range as an env
* format
* Dedupe FA/PA attn toggles, set FA off by default
* Lint again, to fixed point
* Don't calculate KV scales dynamically if Q scale is included
---------
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
0 commit comments