Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Draft: ConcurrentLruCache2
What / Why
ConcurrentLruCache2Benchmark(Throughput, Threads=8, capacity=100, missRate=0.1):ConcurrentLruCache136,826 ops/s →ConcurrentLruCache21,237,818 ops/s(~9.0× / ≈ 9.05× throughput improvement).
ConcurrentLruCache2: a performance-oriented LRU alternative that reduces read/write contention viawider striping (next power-of-two of
availableProcessors), padded per-stripe counters, and a pending-based drain strategy.Key changes
ReadOperations)availableProcessors, removing the previousmax=4cap).AtomicLongArray-based counters with per-stripe counter objects, and apply padding(
PaddedAtomicLong/PaddedLong) to mitigate false sharing on hot counter-update paths.WriteOperations)getreturnsnullon a miss (no automatic loader). Callers populate viaput.setEvictionListenerreceivingEntry(key, value)on eviction (default: no-op).Performance bottlenecks (existing) and improvements (this PR)
Bottleneck:
AtomicLongArray-based counter updatesAtomicLongArrayforrecordedCount/processedCount.Because it is backed by a contiguous primitive array, hot per-stripe counter updates may pay extra overhead and can be more
sensitive to cache-line interactions.
Improvement:
AtomicLongArray→ per-stripe counter objectsConcurrentLruCache2uses per-stripe counter objects (AtomicLong[]) instead ofAtomicLongArray,reducing dependence on a contiguous primitive array layout and aiming to lower overhead/contended updates.
Bottleneck: false sharing from contiguous counter layout
ConcurrentLruCache,ReadOperationstracks per-stripe progress viarecordedCount/processedCount/readCount. When these are stored in contiguous primitive arrays,adjacent stripes can share cache lines.
cache-line bouncing (false sharing), commonly reflected as higher backend stalls and CPI.
Validation: macOS CPU Performance Counters
Metrics (short notes)
CPI (Cycles per Instruction): average cycles per retired instruction; tends to increase with waiting/coordination overhead.
ARM_STALL_BACKEND: cycles where the pipeline backend is stalled; can increase with coherence/ownership waits.
ARM_STALL_BACKEND / Cycles: fraction of total cycles spent stalled in the backend.
ARM_L1D_CACHE_REFILL: number of L1D cache refills; churn can increase with invalidation/refill activity.
Observation: after padding, CPI, ARM_STALL_BACKEND/Cycles, and ARM_L1D_CACHE_REFILL/Instructions decreased, which is
consistent with reduced cache-line interference on the hot path.
Improvement: padded counters to mitigate false sharing
AtomicLongArray/long[]usage to per-stripe padded objects(
PaddedAtomicLong,PaddedLong) to reduce cache-line collisions between frequently-updated counters, targetinglower stalls on the
recordReadand drain-check paths.Bottleneck: limited striping in
ReadOperationsReadOperationsusesmin(4, nextPowerOfTwo(availableProcessors))(i.e., at most 4 stripes),increasing the chance of multiple threads sharing the same buffers/counters under higher thread counts.
Improvement: expand
ReadOperationsstripingConcurrentLruCache2sets the number of buffers to the next power-of-two ofavailableProcessors(removing the
max=4cap), spreading threads across more stripes and reducing contention in record/drain-check paths.Bottleneck: drains attempted on every write
ConcurrentLruCachesetsdrainStatus = REQUIREDand attempts a drain on each write (e.g.put),which can lead to frequent drain attempts and lock contention during write bursts.
Improvement: pending-based drain (
WriteOperations)ConcurrentLruCache2tracks pending write tasks; when pending is below a threshold, drains can be deferred to avoidunnecessary drain attempts.
Compatibility and migration
ConcurrentLruCache2is an additional implementation with a different operational model; it does not replaceConcurrentLruCache.ConcurrentLruCache: generator-based miss → automatic generate + populateConcurrentLruCache2: manual population (getmiss →null; caller decides whether/when toput)ConcurrentLruCache2exposesputaspublicso callers can control population.ConcurrentLruCache, or call an external loader and thenput.getalways returnsnulland entries inserted viaputare immediately evicted(effectively disabling caching).
nullfromget; stored values must be non-null.Entry(key, value)on eviction/removal/clear.ConcurrentLruCachefor auto-loader needsConcurrentLruCache2for manual population + eviction hook + lower contentionTests
./gradlew :spring-core:test(JDK 25)JAVA_HOME=/path/to/jdk25 ./gradlew :spring-core:jmhJar$JAVA_HOME/bin/java -jar spring-core/build/libs/*-jmh.jar "org.springframework.util.ConcurrentLruCache2Benchmark.*"