-
Notifications
You must be signed in to change notification settings - Fork 439
Testing zero-copy bugs fixes (not for merging) #1156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…pcClientProtocolService. (apache#1026)
…he same peer as the current valid leader (apache#1024)
… released properly (apache#1023)
| private final Condition notEmpty = lock.newCondition(); | ||
|
|
||
| private boolean closed = false; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use atomic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. :)
| private final Condition notEmpty = lock.newCondition(); | ||
|
|
||
| private boolean closed = false; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. :)
|
Finally, it is able to pass all the tests (with a few retries). Note that there are probably some other zero copy bugs. Will fix them separately. |
|
This can pass all the tests (with a few retries). Since this change is quite big (56kB) and non-trivial, I will split this to a few JIRAs:
I will see if (2) and (3) needed to be further split. BTW, we should move |
This fix will be split into multiple JIRAs: RATIS-2164, RATIS-2151, RATIS-2173
The following are the bugs found so far:
LeakDetector: assertedallLeaksis non-empty but printed "allLeaks.size = 0"retain. Without callingretainat all, it is not a leak.SimpleTracingandAdvancedTracing: the methods should be synchronized.AdvancedTracingshould have a single track list instead ofretainsTracesandreleaseTraces.GrpcClientProtocolService.UnorderedRequestStreamObserver.processClientRequest(..)should use try-finally.GrpcLogAppender.appendLog(..)callsrelease()incorrectly for exception.LogAppenderDefault.sendAppendEntriesWithRetries(..)callsrelease()incorrectly for exception.LogSegmentcache can release an entry multiple times.LogSegment.loadCache(..)should callretain()for cache hit.SegmentedRaftLog.retainLog(..): between getting the entry and callingretain(), the entry can be released. The "fail to retain" exception, if there is any, can be ignored since It is the same as a cache miss. See RATIS-2159. TestRaftWithSimulatedRpc could "fail to retain". #1153SegmentedRaftLog.retainEntryWithData(..)should release for exception.SimpleStateMachine4Testingcan be released.LogSegment: New entries can be added after EntryCache is closed.MemoryRaftLoghas similar problems as inSegmentedRaftLog.SegmentedRaftLogWorkershould clean up unfinished tasks in the queue after stopped running.