Skip to content

Conversation

@DiamonDinoia
Copy link
Contributor

Hi,

This PR optimzes swizzles where possible
Fixed a bug in the float avx2 swizzle which did not allow duplicates
Adds more tests

Copy link
Contributor

@serge-sans-paille serge-sans-paille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could reduce code duplication by syndicating patterns as suggested, that would be great!

@DiamonDinoia
Copy link
Contributor Author

DiamonDinoia commented Jul 11, 2025

If this passes the tests it can be merged. I hope I did not break any build this time.

Should I add myself to the copyright?

@serge-sans-paille
Copy link
Contributor

You can add yourself, yes. And rebase on master branch ;-)

@DiamonDinoia
Copy link
Contributor Author

I rebased. I did not squash as I have some commits that I would like to keep for reference in the branch. Feel free to squash and merge in master.

@serge-sans-paille
Copy link
Contributor

Hey @DiamonDinoia : I'll have a look at this next week, don't worry if you don't have any feedback since then 🙇

@DiamonDinoia DiamonDinoia force-pushed the improving-swizzle branch 2 times, most recently from ec97ae8 to e99877d Compare July 25, 2025 19:18
@DiamonDinoia
Copy link
Contributor Author

I am not sure what the problem is with PowerPC. It should not touch any code I wrote and from the print the results seems correct.

@serge-sans-paille
Copy link
Contributor

How strange: the failure shows batches that... look the same /o\

@DiamonDinoia
Copy link
Contributor Author

I could not find an explanation for it.

@serge-sans-paille
Copy link
Contributor

If you don't mind, I'll split your commit history in pieces and validate them step-by-step in order to understand what's happening - keeping your authorship information, of course.

@DiamonDinoia
Copy link
Contributor Author

Sure, no problem, go ahead!

@serge-sans-paille
Copy link
Contributor

perfect, can you just squash your commits into a nice history? Thanks o/

Fixes: some swizzled did not allow duplicates in the output
@serge-sans-paille serge-sans-paille merged commit 825d298 into xtensor-stack:master Aug 8, 2025
65 checks passed
@serge-sans-paille
Copy link
Contributor

Thanks for being patient and for the great patch set \o/

@serge-sans-paille
Copy link
Contributor

See ##1155 for the sse2 part

@DiamonDinoia
Copy link
Contributor Author

Thanks for being patient and for the great patch set \o/

My pleasure!

Comment on lines +945 to +946
// The intrinsic does NOT allow to copy the same element of the source vector to more than one element of the destination vector.
// one-shot 8-lane permute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this tested concretely? According to the intel ISA reference for VPERMPS,

this instruction permits a doubleword in the source operand to be copied to more than one location in the destination operand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be surprising if it was different from _mm256_permutevar8x32_epi32, in any case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image from: https://www.intel.com/content/www/us/en/content-details/671569/19-0-c-compiler-developer-guide-and-reference.html?wapkw=_mm256_permutevar8x32_ps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a webpage once but intel changed their website and I cannot find it anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same for _mm256_permutevar8x32_epi32 If I use that with duplicates I introduced a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a problem with duplicates originally, that's why I investigated. But I think it was showing up with sse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@AntoinePrv you could be interested in this discussion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to revert the changes since ISA says that it is allowed. Would you like to open a PR or issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try to, but I'm not a xsimd developer (I just encountered this by chance when grepping for xsimd's support for shuffling).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it once I have time otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants