Why Does QNN Choose SHA Over MHA in Its Pass? #16894

chenghuaWang · 2026-01-27T07:10:57Z

chenghuaWang
Jan 27, 2026

I noticed that in the pass of qnn, there is a behavior of translating mha to sha. What is the purpose of this? Is it because the linear in qnn overflows when the number of heads is large? Or is it because the example of masked attention in qnn documentation is for a single head?

Answered by cccclai

Jan 29, 2026

It's because SHA is more efficient than MHA in qnn backend

View full answer

cccclai · 2026-01-29T00:19:18Z

cccclai
Jan 29, 2026
Collaborator

It's because SHA is more efficient than MHA in qnn backend

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Does QNN Choose SHA Over MHA in Its Pass? #16894

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why Does QNN Choose SHA Over MHA in Its Pass? #16894

Uh oh!

chenghuaWang Jan 27, 2026

Replies: 1 comment

Uh oh!

cccclai Jan 29, 2026 Collaborator

chenghuaWang
Jan 27, 2026

cccclai
Jan 29, 2026
Collaborator