x86_64/AArch64: Add AVX2/Neon polyw1_pack to x86_64 native backend#973
x86_64/AArch64: Add AVX2/Neon polyw1_pack to x86_64 native backend#973mkannwischer wants to merge 2 commits intomainfrom
Conversation
Integrate polyw1_pack AVX2 implementations for both GAMMA2 variants into the native backend. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
0b22412 to
1bb503b
Compare
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
45691 cycles |
45683 cycles |
1.00 |
ML-DSA-44 sign |
128992 cycles |
131163 cycles |
0.98 |
ML-DSA-44 verify |
47012 cycles |
47529 cycles |
0.99 |
ML-DSA-65 keypair |
80465 cycles |
80463 cycles |
1.00 |
ML-DSA-65 sign |
214956 cycles |
215738 cycles |
1.00 |
ML-DSA-65 verify |
79587 cycles |
79737 cycles |
1.00 |
ML-DSA-87 keypair |
131152 cycles |
131178 cycles |
1.00 |
ML-DSA-87 sign |
276231 cycles |
277066 cycles |
1.00 |
ML-DSA-87 verify |
129895 cycles |
129990 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112049 cycles |
111974 cycles |
1.00 |
ML-DSA-44 sign |
403823 cycles |
403601 cycles |
1.00 |
ML-DSA-44 verify |
119939 cycles |
119892 cycles |
1.00 |
ML-DSA-65 keypair |
192134 cycles |
192181 cycles |
1.00 |
ML-DSA-65 sign |
657108 cycles |
657104 cycles |
1.00 |
ML-DSA-65 verify |
193869 cycles |
193901 cycles |
1.00 |
ML-DSA-87 keypair |
318020 cycles |
318040 cycles |
1.00 |
ML-DSA-87 sign |
837065 cycles |
837047 cycles |
1.00 |
ML-DSA-87 verify |
323003 cycles |
323045 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
34648 cycles |
34818 cycles |
1.00 |
ML-DSA-44 sign |
115881 cycles |
119743 cycles |
0.97 |
ML-DSA-44 verify |
37118 cycles |
38134 cycles |
0.97 |
ML-DSA-65 keypair |
60681 cycles |
60836 cycles |
1.00 |
ML-DSA-65 sign |
198566 cycles |
200613 cycles |
0.99 |
ML-DSA-65 verify |
62513 cycles |
62640 cycles |
1.00 |
ML-DSA-87 keypair |
93487 cycles |
93373 cycles |
1.00 |
ML-DSA-87 sign |
236201 cycles |
232798 cycles |
1.01 |
ML-DSA-87 verify |
95896 cycles |
95570 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
93841 cycles |
93709 cycles |
1.00 |
ML-DSA-44 sign |
333807 cycles |
332609 cycles |
1.00 |
ML-DSA-44 verify |
99869 cycles |
99635 cycles |
1.00 |
ML-DSA-65 keypair |
159923 cycles |
160109 cycles |
1.00 |
ML-DSA-65 sign |
543699 cycles |
544366 cycles |
1.00 |
ML-DSA-65 verify |
160683 cycles |
160833 cycles |
1.00 |
ML-DSA-87 keypair |
266467 cycles |
267045 cycles |
1.00 |
ML-DSA-87 sign |
705143 cycles |
706279 cycles |
1.00 |
ML-DSA-87 verify |
270387 cycles |
270100 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113214 cycles |
113201 cycles |
1.00 |
ML-DSA-44 sign |
350397 cycles |
355543 cycles |
0.99 |
ML-DSA-44 verify |
116730 cycles |
117896 cycles |
0.99 |
ML-DSA-65 keypair |
196335 cycles |
196439 cycles |
1.00 |
ML-DSA-65 sign |
588222 cycles |
588538 cycles |
1.00 |
ML-DSA-65 verify |
194453 cycles |
194475 cycles |
1.00 |
ML-DSA-87 keypair |
322439 cycles |
321909 cycles |
1.00 |
ML-DSA-87 sign |
751213 cycles |
752725 cycles |
1.00 |
ML-DSA-87 verify |
319951 cycles |
320145 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68774 cycles |
69001 cycles |
1.00 |
ML-DSA-44 sign |
179744 cycles |
187631 cycles |
0.96 |
ML-DSA-44 verify |
67303 cycles |
69172 cycles |
0.97 |
ML-DSA-65 keypair |
122437 cycles |
119360 cycles |
1.03 |
ML-DSA-65 sign |
301955 cycles |
299878 cycles |
1.01 |
ML-DSA-65 verify |
118088 cycles |
115464 cycles |
1.02 |
ML-DSA-87 keypair |
203390 cycles |
203890 cycles |
1.00 |
ML-DSA-87 sign |
390522 cycles |
394779 cycles |
0.99 |
ML-DSA-87 verify |
195089 cycles |
195702 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
57073 cycles |
56454 cycles |
1.01 |
ML-DSA-44 sign |
175719 cycles |
181513 cycles |
0.97 |
ML-DSA-44 verify |
60021 cycles |
61053 cycles |
0.98 |
ML-DSA-65 keypair |
98321 cycles |
98631 cycles |
1.00 |
ML-DSA-65 sign |
297249 cycles |
298535 cycles |
1.00 |
ML-DSA-65 verify |
100096 cycles |
100069 cycles |
1.00 |
ML-DSA-87 keypair |
152050 cycles |
152650 cycles |
1.00 |
ML-DSA-87 sign |
353023 cycles |
355109 cycles |
0.99 |
ML-DSA-87 verify |
152239 cycles |
152994 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68230 cycles |
68148 cycles |
1.00 |
ML-DSA-44 sign |
196078 cycles |
201830 cycles |
0.97 |
ML-DSA-44 verify |
69368 cycles |
70787 cycles |
0.98 |
ML-DSA-65 keypair |
121376 cycles |
121099 cycles |
1.00 |
ML-DSA-65 sign |
330550 cycles |
331249 cycles |
1.00 |
ML-DSA-65 verify |
117750 cycles |
117837 cycles |
1.00 |
ML-DSA-87 keypair |
198142 cycles |
197912 cycles |
1.00 |
ML-DSA-87 sign |
426496 cycles |
426817 cycles |
1.00 |
ML-DSA-87 verify |
194325 cycles |
194367 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212357 cycles |
212585 cycles |
1.00 |
ML-DSA-44 sign |
759494 cycles |
759285 cycles |
1.00 |
ML-DSA-44 verify |
228690 cycles |
228959 cycles |
1.00 |
ML-DSA-65 keypair |
379923 cycles |
380251 cycles |
1.00 |
ML-DSA-65 sign |
1252035 cycles |
1251223 cycles |
1.00 |
ML-DSA-65 verify |
371571 cycles |
372021 cycles |
1.00 |
ML-DSA-87 keypair |
604671 cycles |
605353 cycles |
1.00 |
ML-DSA-87 sign |
1593513 cycles |
1591234 cycles |
1.00 |
ML-DSA-87 verify |
618457 cycles |
617441 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
134786 cycles |
134688 cycles |
1.00 |
ML-DSA-44 sign |
523577 cycles |
524187 cycles |
1.00 |
ML-DSA-44 verify |
147461 cycles |
147201 cycles |
1.00 |
ML-DSA-65 keypair |
226346 cycles |
226675 cycles |
1.00 |
ML-DSA-65 sign |
860567 cycles |
859973 cycles |
1.00 |
ML-DSA-65 verify |
234837 cycles |
234911 cycles |
1.00 |
ML-DSA-87 keypair |
372003 cycles |
370452 cycles |
1.00 |
ML-DSA-87 sign |
1083875 cycles |
1078410 cycles |
1.01 |
ML-DSA-87 verify |
384062 cycles |
382956 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
42148 cycles |
41562 cycles |
1.01 |
ML-DSA-44 sign |
129288 cycles |
133651 cycles |
0.97 |
ML-DSA-44 verify |
43759 cycles |
44169 cycles |
0.99 |
ML-DSA-65 keypair |
71975 cycles |
72989 cycles |
0.99 |
ML-DSA-65 sign |
214109 cycles |
220760 cycles |
0.97 |
ML-DSA-65 verify |
73284 cycles |
74207 cycles |
0.99 |
ML-DSA-87 keypair |
107702 cycles |
108105 cycles |
1.00 |
ML-DSA-87 sign |
248261 cycles |
250082 cycles |
0.99 |
ML-DSA-87 verify |
109090 cycles |
108427 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
157594 cycles |
157373 cycles |
1.00 |
ML-DSA-44 sign |
549971 cycles |
549788 cycles |
1.00 |
ML-DSA-44 verify |
169054 cycles |
169220 cycles |
1.00 |
ML-DSA-65 keypair |
267930 cycles |
267878 cycles |
1.00 |
ML-DSA-65 sign |
903155 cycles |
903152 cycles |
1.00 |
ML-DSA-65 verify |
274249 cycles |
274318 cycles |
1.00 |
ML-DSA-87 keypair |
447966 cycles |
447643 cycles |
1.00 |
ML-DSA-87 sign |
1159788 cycles |
1157310 cycles |
1.00 |
ML-DSA-87 verify |
457774 cycles |
457942 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128364 cycles |
128172 cycles |
1.00 |
ML-DSA-44 sign |
447652 cycles |
447244 cycles |
1.00 |
ML-DSA-44 verify |
138210 cycles |
142135 cycles |
0.97 |
ML-DSA-65 keypair |
220785 cycles |
220615 cycles |
1.00 |
ML-DSA-65 sign |
727254 cycles |
726560 cycles |
1.00 |
ML-DSA-65 verify |
222808 cycles |
223116 cycles |
1.00 |
ML-DSA-87 keypair |
364610 cycles |
365048 cycles |
1.00 |
ML-DSA-87 sign |
926038 cycles |
926588 cycles |
1.00 |
ML-DSA-87 verify |
372875 cycles |
372428 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120810 cycles |
120331 cycles |
1.00 |
ML-DSA-44 sign |
447005 cycles |
447521 cycles |
1.00 |
ML-DSA-44 verify |
130168 cycles |
132880 cycles |
0.98 |
ML-DSA-65 keypair |
204763 cycles |
205729 cycles |
1.00 |
ML-DSA-65 sign |
728023 cycles |
728528 cycles |
1.00 |
ML-DSA-65 verify |
210330 cycles |
211143 cycles |
1.00 |
ML-DSA-87 keypair |
337390 cycles |
338699 cycles |
1.00 |
ML-DSA-87 sign |
922663 cycles |
923705 cycles |
1.00 |
ML-DSA-87 verify |
348278 cycles |
346629 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
72314 cycles |
72302 cycles |
1.00 |
ML-DSA-44 sign |
205786 cycles |
211874 cycles |
0.97 |
ML-DSA-44 verify |
73981 cycles |
75647 cycles |
0.98 |
ML-DSA-65 keypair |
127445 cycles |
127575 cycles |
1.00 |
ML-DSA-65 sign |
349694 cycles |
350353 cycles |
1.00 |
ML-DSA-65 verify |
125303 cycles |
125483 cycles |
1.00 |
ML-DSA-87 keypair |
205809 cycles |
208020 cycles |
0.99 |
ML-DSA-87 sign |
448791 cycles |
449002 cycles |
1.00 |
ML-DSA-87 verify |
205264 cycles |
205683 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138660 cycles |
138464 cycles |
1.00 |
ML-DSA-44 sign |
484010 cycles |
484091 cycles |
1.00 |
ML-DSA-44 verify |
148504 cycles |
156396 cycles |
0.95 |
ML-DSA-65 keypair |
241327 cycles |
241147 cycles |
1.00 |
ML-DSA-65 sign |
792592 cycles |
792223 cycles |
1.00 |
ML-DSA-65 verify |
240723 cycles |
241092 cycles |
1.00 |
ML-DSA-87 keypair |
395470 cycles |
396403 cycles |
1.00 |
ML-DSA-87 sign |
1013125 cycles |
1012979 cycles |
1.00 |
ML-DSA-87 verify |
402895 cycles |
402335 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113609 cycles |
113289 cycles |
1.00 |
ML-DSA-44 sign |
350629 cycles |
355883 cycles |
0.99 |
ML-DSA-44 verify |
117092 cycles |
117973 cycles |
0.99 |
ML-DSA-65 keypair |
196545 cycles |
196446 cycles |
1.00 |
ML-DSA-65 sign |
587254 cycles |
589191 cycles |
1.00 |
ML-DSA-65 verify |
194237 cycles |
194679 cycles |
1.00 |
ML-DSA-87 keypair |
322301 cycles |
322682 cycles |
1.00 |
ML-DSA-87 sign |
752294 cycles |
752805 cycles |
1.00 |
ML-DSA-87 verify |
320094 cycles |
320327 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212986 cycles |
213066 cycles |
1.00 |
ML-DSA-44 sign |
760407 cycles |
760558 cycles |
1.00 |
ML-DSA-44 verify |
241293 cycles |
233103 cycles |
1.04 |
ML-DSA-65 keypair |
380806 cycles |
381034 cycles |
1.00 |
ML-DSA-65 sign |
1252121 cycles |
1252511 cycles |
1.00 |
ML-DSA-65 verify |
372320 cycles |
372570 cycles |
1.00 |
ML-DSA-87 keypair |
606317 cycles |
606046 cycles |
1.00 |
ML-DSA-87 sign |
1593381 cycles |
1593756 cycles |
1.00 |
ML-DSA-87 verify |
618121 cycles |
617945 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 verify |
241293 cycles |
233103 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
827435 cycles |
828343 cycles |
1.00 |
ML-DSA-44 sign |
3238214 cycles |
3235012 cycles |
1.00 |
ML-DSA-44 verify |
921978 cycles |
920749 cycles |
1.00 |
ML-DSA-65 keypair |
1412999 cycles |
1413905 cycles |
1.00 |
ML-DSA-65 sign |
5347624 cycles |
5341776 cycles |
1.00 |
ML-DSA-65 verify |
1477830 cycles |
1478062 cycles |
1.00 |
ML-DSA-87 keypair |
2312761 cycles |
2313582 cycles |
1.00 |
ML-DSA-87 sign |
6663968 cycles |
6664057 cycles |
1.00 |
ML-DSA-87 verify |
2410302 cycles |
2412445 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
278362 cycles |
277272 cycles |
1.00 |
ML-DSA-44 sign |
798797 cycles |
822468 cycles |
0.97 |
ML-DSA-44 verify |
277666 cycles |
277832 cycles |
1.00 |
ML-DSA-65 keypair |
480516 cycles |
475993 cycles |
1.01 |
ML-DSA-65 sign |
1349323 cycles |
1333415 cycles |
1.01 |
ML-DSA-65 verify |
456855 cycles |
458979 cycles |
1.00 |
ML-DSA-87 keypair |
817862 cycles |
817627 cycles |
1.00 |
ML-DSA-87 sign |
1841380 cycles |
1833605 cycles |
1.00 |
ML-DSA-87 verify |
788029 cycles |
798022 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
464640 cycles |
465283 cycles |
1.00 |
ML-DSA-44 sign |
2150220 cycles |
2151007 cycles |
1.00 |
ML-DSA-44 verify |
550751 cycles |
550792 cycles |
1.00 |
ML-DSA-65 keypair |
779027 cycles |
780624 cycles |
1.00 |
ML-DSA-65 sign |
3514123 cycles |
3517857 cycles |
1.00 |
ML-DSA-65 verify |
856201 cycles |
854537 cycles |
1.00 |
ML-DSA-87 keypair |
1261706 cycles |
1268967 cycles |
0.99 |
ML-DSA-87 sign |
4350624 cycles |
4402745 cycles |
0.99 |
ML-DSA-87 verify |
1373405 cycles |
1380067 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
CBMC Results (ML-DSA-65)Full Results (175 proofs)
|
CBMC Results (ML-DSA-44)Full Results (175 proofs)
|
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
231675 cycles |
223693 cycles |
1.04 |
ML-DSA-44 sign |
637708 cycles |
608242 cycles |
1.05 |
ML-DSA-44 verify |
228579 cycles |
221112 cycles |
1.03 |
ML-DSA-65 keypair |
412694 cycles |
394259 cycles |
1.05 |
ML-DSA-65 sign |
1064124 cycles |
1015180 cycles |
1.05 |
ML-DSA-65 verify |
390621 cycles |
372405 cycles |
1.05 |
ML-DSA-87 keypair |
682821 cycles |
653922 cycles |
1.04 |
ML-DSA-87 sign |
1412962 cycles |
1363561 cycles |
1.04 |
ML-DSA-87 verify |
668826 cycles |
637673 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
231675 cycles |
223693 cycles |
1.04 |
ML-DSA-44 sign |
637708 cycles |
608242 cycles |
1.05 |
ML-DSA-44 verify |
228579 cycles |
221112 cycles |
1.03 |
ML-DSA-65 keypair |
412694 cycles |
394259 cycles |
1.05 |
ML-DSA-65 sign |
1064124 cycles |
1015180 cycles |
1.05 |
ML-DSA-65 verify |
390621 cycles |
372405 cycles |
1.05 |
ML-DSA-87 keypair |
682821 cycles |
653922 cycles |
1.04 |
ML-DSA-87 sign |
1412962 cycles |
1363561 cycles |
1.04 |
ML-DSA-87 verify |
668826 cycles |
637673 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
CBMC Results (ML-DSA-87)Full Results (175 proofs)
|
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
313397 cycles |
321433 cycles |
0.97 |
ML-DSA-44 sign |
1215690 cycles |
1202861 cycles |
1.01 |
ML-DSA-44 verify |
343963 cycles |
340204 cycles |
1.01 |
ML-DSA-65 keypair |
572366 cycles |
569351 cycles |
1.01 |
ML-DSA-65 sign |
2038364 cycles |
1955934 cycles |
1.04 |
ML-DSA-65 verify |
548523 cycles |
548845 cycles |
1.00 |
ML-DSA-87 keypair |
908267 cycles |
885828 cycles |
1.03 |
ML-DSA-87 sign |
2517161 cycles |
2512147 cycles |
1.00 |
ML-DSA-87 verify |
925719 cycles |
902578 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 885bca1 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-65 sign |
2038364 cycles |
1955934 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
77618aa to
6e091cb
Compare
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 4b4c152 | Previous: 0b1c536 | Ratio |
|---|---|---|---|
ML-DSA-87 sign |
1900701 cycles |
1833605 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
6e091cb to
4b4c152
Compare
Add AArch64 assembly implementations of polyw1_pack for both GAMMA2 variants using TBL-based byte extraction from 32-bit coefficient lanes. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
4b4c152 to
885bca1
Compare
|
The scheme benchmarks suggest we should implement the 88 variant (ML-DSA-44), but not the 32 variant (ML-DSA-65/ML-DSA-87). It makes sense that the 88 one is harder to auto-vectorize, but it feels a little inconsistent to only implement one. WDYT @hanno-becker? |
Integrate polyw1_pack AVX2 implementations for both GAMMA2 variants into the native backend.
polyw1_packcomponent benchmarksIntel Xeon 8375C (c6i.metal, no Turbo Boost, no SMT)
Apple M1
TODO: