Commit 2e73a6c
committed
Added special case for _full_usm_ndarray
Bitwise zero values, and 1-byte wide types now use memset, instead
of using fill.
```
In [1]: import dpctl.tensor as dpt, dpctl.tensor._tensor_impl as ti
In [2]: res = dpt.empty(10**6, dtype="i8")
In [3]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(0, dst=res, sycl_queue=res.sycl_queue)[0].wait()
243 µs ± 22.6 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [4]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(0, dst=res, sycl_queue=res.sycl_queue)[0].wait()
229 µs ± 14 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [5]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait()
227 µs ± 23 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [6]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait()
233 µs ± 25.9 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [7]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait()
301 µs ± 54.1 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [8]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait()
236 µs ± 17.2 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [9]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(0, dst=res, sycl_queue=res.sycl_queue)[0].wait()
240 µs ± 35.2 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [10]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(1, dst=res, sycl_queue=res.sycl_queue)[0].wait()
243 µs ± 17.6 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [11]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(1, dst=res, sycl_queue=res.sycl_queue)[0].wait()
263 µs ± 39.9 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [12]: %timeit -n 2000 -r 11 ti._full_usm_ndarray(0, dst=res, sycl_queue=res.sycl_queue)[0].wait()
239 µs ± 26.4 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
In [13]: %timeit -n 2000 -r 11 ti._zeros_usm_ndarray(dst=res, sycl_queue=res.sycl_queue)[0].wait()
224 µs ± 18.1 µs per loop (mean ± std. dev. of 11 runs, 2,000 loops each)
```1 parent bec95f9 commit 2e73a6c
1 file changed
+58
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
83 | | - | |
| 83 | + | |
84 | 84 | | |
85 | | - | |
86 | | - | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
87 | 142 | | |
88 | 143 | | |
89 | 144 | | |
| |||
126 | 181 | | |
127 | 182 | | |
128 | 183 | | |
129 | | - | |
130 | 184 | | |
131 | 185 | | |
132 | 186 | | |
| |||
0 commit comments