Commit af82129
committed
Save SLM chunks into registers prior to summing over delta_m, delta_n
This does not make any difference in performance due to compiler
optimization being effective as loops are unrolled, but makes compiler's
job easier and the intent clearer.1 parent cfba263 commit af82129
File tree
1 file changed
+18
-4
lines changed- dpctl/tensor/libtensor/include/kernels/linalg_functions
1 file changed
+18
-4
lines changedLines changed: 18 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
970 | 970 | | |
971 | 971 | | |
972 | 972 | | |
| 973 | + | |
973 | 974 | | |
974 | 975 | | |
975 | 976 | | |
| |||
1057 | 1058 | | |
1058 | 1059 | | |
1059 | 1060 | | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
| 1065 | + | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
| 1071 | + | |
| 1072 | + | |
| 1073 | + | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
1060 | 1077 | | |
1061 | 1078 | | |
1062 | 1079 | | |
1063 | 1080 | | |
1064 | 1081 | | |
1065 | 1082 | | |
1066 | | - | |
1067 | | - | |
1068 | | - | |
1069 | | - | |
| 1083 | + | |
1070 | 1084 | | |
1071 | 1085 | | |
1072 | 1086 | | |
| |||
0 commit comments