-
Notifications
You must be signed in to change notification settings - Fork 504
[client]Fix memory leak if oom when decompress data. #2647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
dd1a57b to
6daa427
Compare
|
@platinumhamburg @wuchong , CC |
| // vectorSchemaRootMap, | ||
| // for example temporary buffers left behind when an OOM occurs during decompression | ||
| // (see https://github.com/apache/fluss/issues/2646). | ||
| bufferAllocator.releaseBytes(bufferAllocator.getAllocatedMemory()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! The approach is consistent with RecordAccumulator.close().
One concern: releaseBytes() only adjusts the allocator's accounting but doesn't actually free the underlying direct memory - those buffers will
remain until GC. This fix prevents the IllegalStateException on close, which is good for avoiding cascading failures, but the actual memory may still
leak temporarily.
For a more robust solution, consider adding try-finally in ZstdArrowCompressionCodec.doDecompress() to ensure temp buffers are released on failure.
But the current fix is acceptable as a defensive measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@platinumhamburg , I modified this PR and add a patched version of Arrow's {@code VectorLoader} to fix this problem. Detail see VectorLoaderTest
0b07ab6 to
0e740b6
Compare
platinumhamburg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work identifying and fixing this Arrow buffer leak! I verified that the upstream Arrow-Java main branch
still has this bug in VectorLoader.loadBuffers() — the decompression loop sits outside the try block, and
buf.close() only runs on the success path. So this patch is definitely necessary.
The fix logic is correct: moving the decompression loop inside the try block and releasing buffers in the finally
clause properly handles both the success path (loadFieldBuffers retains +1, finally close -1) and the error path
(finally close frees all already-decompressed buffers).
Below are only one suggestion to improve maintainability.
| * | ||
| * @see <a href="https://github.com/apache/fluss/issues/2646">FLUSS-2646</a> | ||
| */ | ||
| public class VectorLoader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest changing the approach from class-shadowing to an explicitly named class under a
Fluss-owned package. Specifically: rename this to FlussVectorLoader and move it to
org.apache.fluss.record, then update the call site in ArrowUtils.createArrowReader().
I've verified locally that all Arrow APIs used in loadBuffers() — TypeLayout, FieldVector,
Collections2, Preconditions, etc. — are public in the shaded JAR, so there's no package-visibility
blocker. Compilation and all 3 tests pass with this change.
Why this is better than class-shadowing:
-
Explicit over implicit. The current approach relies on classpath ordering to silently
override the shaded Arrow JAR's VectorLoader. WithFlussVectorLoader, the call site in
ArrowUtils makes it obvious that a patched loader is being used — no hidden magic. -
Compile-time safety on Arrow upgrades. If Arrow's VectorLoader API changes in a future
version (e.g. the upstreammainbranch already added avariadicBufferCountsparameter),
the shadowed class would silently diverge. With an explicit class, any API incompatibility
surfaces as a compile error immediately. -
No risk of classpath conflicts. Class-shadowing behavior depends on JAR ordering, which
can vary across build tools, IDEs, and deployment environments. An explicitly named class
eliminates this uncertainty entirely.
The change is straightforward:
- Rename
VectorLoader→FlussVectorLoaderinorg.apache.fluss.record - Update import + usage in
ArrowUtils.createArrowReader() - Update import + usage in
VectorLoaderTest
Please also add a TODO in the class Javadoc for future cleanup:
// TODO(FLUSS-2646): Remove this patched VectorLoader once Apache Arrow fixes
// the buffer leak in loadBuffers(). Current Arrow version: 15.0.0
Additionally, I'd recommend filing an issue in https://github.com/apache/arrow-java/issues to
push for an upstream fix — that way we have a clear signal for when this patch can be retired.

Purpose
Linked issue: close #2646
Brief change log
Tests
API and Format
Documentation