Why Does QNN Choose SHA Over MHA in Its Pass? #16894
Answered
by
cccclai
chenghuaWang
asked this question in
Q&A
-
|
I noticed that in the pass of qnn, there is a behavior of translating mha to sha. What is the purpose of this? Is it because the linear in qnn overflows when the number of heads is large? Or is it because the example of masked attention in qnn documentation is for a single head? |
Beta Was this translation helpful? Give feedback.
Answered by
cccclai
Jan 29, 2026
Replies: 1 comment
-
|
It's because SHA is more efficient than MHA in qnn backend |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
chenghuaWang
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It's because SHA is more efficient than MHA in qnn backend