Commit c93ef8f
committed
Auto merge of #375 - QuarticCat:feedback, r=Amanieu
Add `#[inline(always)]` to `find_inner`
## Benchmark
Boost at most 6%.
```text
test clone_from_large ... bench: 5,562 -> 5,523 ns/iter (+/- 96)
test clone_from_small ... bench: 52 -> 52 ns/iter (+/- 0)
test clone_large ... bench: 5,764 -> 5,760 ns/iter (+/- 36)
test clone_small ... bench: 71 -> 72 ns/iter (+/- 12)
test grow_insert_ahash_highbits ... bench: 24,918 -> 25,229 ns/iter (+/- 710)
test grow_insert_ahash_random ... bench: 24,590 -> 24,824 ns/iter (+/- 279)
test grow_insert_ahash_serial ... bench: 24,896 -> 24,892 ns/iter (+/- 3,497)
test grow_insert_std_highbits ... bench: 43,482 -> 43,342 ns/iter (+/- 706)
test grow_insert_std_random ... bench: 43,074 -> 43,674 ns/iter (+/- 592)
test grow_insert_std_serial ... bench: 43,305 -> 43,335 ns/iter (+/- 1,059)
test insert_ahash_highbits ... bench: 25,025 -> 25,110 ns/iter (+/- 96)
test insert_ahash_random ... bench: 25,017 -> 25,329 ns/iter (+/- 7,387)
test insert_ahash_serial ... bench: 24,895 -> 25,041 ns/iter (+/- 692)
test insert_erase_ahash_highbits ... bench: 25,049 -> 25,284 ns/iter (+/- 1,191)
test insert_erase_ahash_random ... bench: 24,836 -> 24,742 ns/iter (+/- 380)
test insert_erase_ahash_serial ... bench: 23,832 -> 23,563 ns/iter (+/- 611)
test insert_erase_std_highbits ... bench: 43,699 -> 43,265 ns/iter (+/- 684)
test insert_erase_std_random ... bench: 44,499 -> 44,193 ns/iter (+/- 670)
test insert_erase_std_serial ... bench: 45,250 -> 43,969 ns/iter (+/- 544)
test insert_std_highbits ... bench: 35,552 -> 34,444 ns/iter (+/- 739)
test insert_std_random ... bench: 34,484 -> 34,596 ns/iter (+/- 508)
test insert_std_serial ... bench: 34,778 -> 34,658 ns/iter (+/- 1,003)
test iter_ahash_highbits ... bench: 1,292 -> 1,292 ns/iter (+/- 89)
test iter_ahash_random ... bench: 1,299 -> 1,295 ns/iter (+/- 72)
test iter_ahash_serial ... bench: 1,306 -> 1,295 ns/iter (+/- 18)
test iter_std_highbits ... bench: 1,277 -> 1,306 ns/iter (+/- 31)
test iter_std_random ... bench: 1,278 -> 1,295 ns/iter (+/- 30)
test iter_std_serial ... bench: 1,303 -> 1,310 ns/iter (+/- 54)
test lookup_ahash_highbits ... bench: 4,766 -> 4,767 ns/iter (+/- 44)
test lookup_ahash_random ... bench: 4,735 -> 4,761 ns/iter (+/- 45)
test lookup_ahash_serial ... bench: 4,670 -> 4,696 ns/iter (+/- 98)
test lookup_fail_ahash_highbits ... bench: 4,438 -> 4,232 ns/iter (+/- 55)
test lookup_fail_ahash_random ... bench: 4,233 -> 4,243 ns/iter (+/- 85)
test lookup_fail_ahash_serial ... bench: 3,928 -> 3,970 ns/iter (+/- 307)
test lookup_fail_std_highbits ... bench: 14,177 -> 14,927 ns/iter (+/- 165)
test lookup_fail_std_random ... bench: 14,217 -> 15,081 ns/iter (+/- 188)
test lookup_fail_std_serial ... bench: 14,253 -> 14,949 ns/iter (+/- 1,439)
test lookup_std_highbits ... bench: 14,700 -> 14,797 ns/iter (+/- 940)
test lookup_std_random ... bench: 14,759 -> 14,996 ns/iter (+/- 609)
test lookup_std_serial ... bench: 14,784 -> 14,928 ns/iter (+/- 293)
test rehash_in_place ... bench: 213,678 -> 218,933 ns/iter (+/- 3,012)
```
## ASM Difference
Here's the major difference of generated assembly of `HashMap::<u64, u64>::get`:

When using `#[inline]`, the last `self.bucket(index)` of `find` doesn't get optimized away. According to `@nbdd0121,` `#[inline(always)]` happens in MIR inlining, while `#[inline]` (likely) happens in LLVM inlining.1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1189 | 1189 | | |
1190 | 1190 | | |
1191 | 1191 | | |
1192 | | - | |
| 1192 | + | |
1193 | 1193 | | |
1194 | 1194 | | |
1195 | 1195 | | |
| |||
0 commit comments