[attention backends] use dedicated wrappers from fa3 for cp.#13165
[attention backends] use dedicated wrappers from fa3 for cp.#13165
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
vasqu
left a comment
There was a problem hiding this comment.
LGTM, trusting you that this works as it seems we don't have tests?
Comments are mostly for my interest / nits
| repo_id="kernels-community/flash-attn3", function_attr="flash_attn_func", revision="fake-ops-return-probs" | ||
| repo_id="kernels-community/flash-attn3", | ||
| function_attr="flash_attn_func", | ||
| revision="fake-ops-return-probs", |
There was a problem hiding this comment.
Ah didnt notice it before, we don't have it merged into main yet?
| out, | ||
| softmax_lse, | ||
| None, | ||
| None, # cu_seqlens_q, cu_seqlens_k |
There was a problem hiding this comment.
Just fmi, there is no fa varlen cp yet? I think meta had a version at some point
There was a problem hiding this comment.
We handle varlen separately + varlens aren't that common in diffusion use cases.
Yes, I manually verified that this works as we're still growing the feature. |
What does this PR do?
Great pointer here #12812 (comment) from @vasqu.