Skip to content

Commit 911575f

Browse files
authored
Add reference to BLAS competitive matrix multiply
1 parent 809ebb6 commit 911575f

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,7 @@ cmpq %rdi, %r13
371371
jne .LBB8_12
372372
```
373373
This is **40-50x** faster than a naive C implementation of nested loops on my machine, and it should be within a factor of 2 of the peak possible performance.
374+
A [similar example](examples/linear_algebra/matrix.cpp#L265-L271) that is only a little more complicated achieves around 90% of the peak possible performance.
374375

375376
(\*) Unfortunately, this doesn't generate performant code currently and requires a few tweaks to work around an [issue](https://bugs.llvm.org/show_bug.cgi?id=45863) in LLVM.
376377
See the [matrix example](examples/linear_algebra/matrix.cpp) for the code that produces the above assembly.

0 commit comments

Comments
 (0)