Skip to content

Conversation

@stefanatwork
Copy link
Collaborator

No description provided.

@stefanatwork stefanatwork added this to the v4.4.1 milestone Dec 2, 2025
@stefanatwork stefanatwork self-assigned this Dec 2, 2025
@kraszkow kraszkow force-pushed the fix-avx512-movemask-truncation branch from a740ca9 to 6e85acc Compare December 2, 2025 12:52
Copy link
Collaborator

@svenwoop svenwoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, I think we need a more general fix. The issue is likely visible as mask in GPR has higher bits corrupted, and the movemask is then just a NOP, which reveals the broken higher bits. Could you please try if you can fix the AVX512 movemask implementation in vbollf16_avx512.h by adding the & 0xFFFF there?

@kraszkow
Copy link
Contributor

kraszkow commented Dec 3, 2025

Thinking more about this, I think we need a more general fix. The issue is likely visible as mask in GPR has higher bits corrupted, and the movemask is then just a NOP, which reveals the broken higher bits. Could you please try if you can fix the AVX512 movemask implementation in vbollf16_avx512.h by adding the & 0xFFFF there?

I had the same idea yesterday, but unfortunately it didn’t work. I also changed the return type from size_t to unsigned short (in addition to adding the mask), but the result was still the same. This fix is really a strange workaround — it only works when implemented exactly the way Stefan did.

@svenwoop
Copy link
Collaborator

svenwoop commented Dec 3, 2025

Ok, then lets merge this workaround. At least we tried a more general fix.

@kraszkow
Copy link
Contributor

kraszkow commented Dec 3, 2025

Ok, then lets merge this workaround. At least we tried a more general fix.

I’d like to implement and merge the CI job that covers this test case. Then we’ll be able to clearly see that the fix actually works. ETA: today.

Edit: Done (see additional commit to this PR)

Copy link
Contributor

@kraszkow kraszkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stefanatwork stefanatwork requested a review from svenwoop December 4, 2025 13:21
@stefanatwork stefanatwork merged commit 9cc2f3e into master Dec 4, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants