-
Notifications
You must be signed in to change notification settings - Fork 419
Workaround GCC AVX512 movemask truncation issue #574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a740ca9 to
6e85acc
Compare
svenwoop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, I think we need a more general fix. The issue is likely visible as mask in GPR has higher bits corrupted, and the movemask is then just a NOP, which reveals the broken higher bits. Could you please try if you can fix the AVX512 movemask implementation in vbollf16_avx512.h by adding the & 0xFFFF there?
I had the same idea yesterday, but unfortunately it didn’t work. I also changed the return type from size_t to unsigned short (in addition to adding the mask), but the result was still the same. This fix is really a strange workaround — it only works when implemented exactly the way Stefan did. |
|
Ok, then lets merge this workaround. At least we tried a more general fix. |
I’d like to implement and merge the CI job that covers this test case. Then we’ll be able to clearly see that the fix actually works. ETA: today. Edit: Done (see additional commit to this PR) |
kraszkow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.