You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -371,6 +371,7 @@ cmpq %rdi, %r13
371
371
jne .LBB8_12
372
372
```
373
373
This is **40-50x** faster than a naive C implementation of nested loops on my machine, and it should be within a factor of 2 of the peak possible performance.
374
+
A [similar example](examples/linear_algebra/matrix.cpp#L265-L271) that is only a little more complicated achieves around 90% of the peak possible performance.
374
375
375
376
(\*) Unfortunately, this doesn't generate performant code currently and requires a few tweaks to work around an [issue](https://bugs.llvm.org/show_bug.cgi?id=45863) in LLVM.
376
377
See the [matrix example](examples/linear_algebra/matrix.cpp) for the code that produces the above assembly.
0 commit comments