-
Notifications
You must be signed in to change notification settings - Fork 291
Improving avx/avx2 swizzles #1141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving avx/avx2 swizzles #1141
Conversation
serge-sans-paille
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you could reduce code duplication by syndicating patterns as suggested, that would be great!
|
If this passes the tests it can be merged. I hope I did not break any build this time. Should I add myself to the copyright? |
|
You can add yourself, yes. And rebase on master branch ;-) |
8239b5e to
cfb5838
Compare
|
I rebased. I did not squash as I have some commits that I would like to keep for reference in the branch. Feel free to squash and merge in master. |
|
Hey @DiamonDinoia : I'll have a look at this next week, don't worry if you don't have any feedback since then 🙇 |
ec97ae8 to
e99877d
Compare
e99877d to
c6b58b6
Compare
|
I am not sure what the problem is with PowerPC. It should not touch any code I wrote and from the print the results seems correct. |
|
How strange: the failure shows batches that... look the same /o\ |
|
I could not find an explanation for it. |
|
If you don't mind, I'll split your commit history in pieces and validate them step-by-step in order to understand what's happening - keeping your authorship information, of course. |
|
Sure, no problem, go ahead! |
c6b58b6 to
b285dbd
Compare
|
perfect, can you just squash your commits into a nice history? Thanks o/ |
Fixes: some swizzled did not allow duplicates in the output
0a906e5 to
8b1c9b9
Compare
|
Thanks for being patient and for the great patch set \o/ |
|
See ##1155 for the sse2 part |
My pleasure! |
| // The intrinsic does NOT allow to copy the same element of the source vector to more than one element of the destination vector. | ||
| // one-shot 8-lane permute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this tested concretely? According to the intel ISA reference for VPERMPS,
this instruction permits a doubleword in the source operand to be copied to more than one location in the destination operand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be surprising if it was different from _mm256_permutevar8x32_epi32, in any case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a webpage once but intel changed their website and I cannot find it anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same for _mm256_permutevar8x32_epi32 If I use that with duplicates I introduced a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem with duplicates originally, that's why I investigated. But I think it was showing up with sse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(@AntoinePrv you could be interested in this discussion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to revert the changes since ISA says that it is allowed. Would you like to open a PR or issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could try to, but I'm not a xsimd developer (I just encountered this by chance when grepping for xsimd's support for shuffling).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do it once I have time otherwise.

Hi,
This PR optimzes swizzles where possible
Fixed a bug in the float avx2 swizzle which did not allow duplicates
Adds more tests