Do not use hadd_pd to implement reduce_add #1108

serge-sans-paille · 2025-04-08T18:43:13Z

It's generally slower than the sse2 version due to a latency of 5 (!) for hadd_pd.

Related to #1107

It's generally slower than the sse2 version due to a latency of 5 (!) for hadd_pd. Related to #1107

Forwarding to sse is actually faster. Related to #1107

…equences

serge-sans-paille · 2025-04-18T19:09:18Z

@DiamonDinoia : why did you guarded the code here: 174c475#diff-c4f5d7f47f45c737cc1723af31069217834daf7da679a0b4ff255a4a6ae73c83R1413 this "intrinsic" seems to be standard - and it passes our validation without the guard.

DiamonDinoia · 2025-04-18T19:28:49Z

I see, I thought it was an icc only intrinsic from the way it was used in Agner's vcl. Altough, I have been using fine on any compiler.

serge-sans-paille · 2025-04-18T19:55:41Z

Great. I'm going to merge this patchset then. Would you mind doing the same investigation for single precision float?

DiamonDinoia · 2025-04-18T20:30:25Z

Thanks for merging this!

Sure, I will have a look next week when I have a moment.

Would you mind considering the API I was suggesting for reducing interleaved complex?

Do not use hadd_pd to implement reduce_add

634b18f

It's generally slower than the sse2 version due to a latency of 5 (!) for hadd_pd. Related to #1107

serge-sans-paille force-pushed the feature/faster-reduce_add branch from 6d0c663 to 634b18f Compare April 17, 2025 12:25

Do not use _mm256_hadd_pd to implement reduce_add on avx

b0a4665

Forwarding to sse is actually faster. Related to #1107

serge-sans-paille force-pushed the feature/faster-reduce_add branch from a208210 to b0a4665 Compare April 18, 2025 10:43

Use _mm512_reduce_add_ps and _mm512_reduce_add_pd instead of custom s…

b64ee62

…equences

serge-sans-paille merged commit bb5dd63 into master Apr 18, 2025
120 checks passed

serge-sans-paille mentioned this pull request Apr 18, 2025

reduce seems slow #1107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not use hadd_pd to implement reduce_add #1108

Do not use hadd_pd to implement reduce_add #1108

Uh oh!

serge-sans-paille commented Apr 8, 2025

Uh oh!

serge-sans-paille commented Apr 18, 2025

Uh oh!

DiamonDinoia commented Apr 18, 2025

Uh oh!

serge-sans-paille commented Apr 18, 2025

Uh oh!

Uh oh!

DiamonDinoia commented Apr 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Do not use hadd_pd to implement reduce_add #1108

Do not use hadd_pd to implement reduce_add #1108

Uh oh!

Conversation

serge-sans-paille commented Apr 8, 2025

Uh oh!

serge-sans-paille commented Apr 18, 2025

Uh oh!

DiamonDinoia commented Apr 18, 2025

Uh oh!

serge-sans-paille commented Apr 18, 2025

Uh oh!

Uh oh!

DiamonDinoia commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DiamonDinoia commented Apr 18, 2025 •

edited

Loading