-
Notifications
You must be signed in to change notification settings - Fork 2.5k
perf: Avoid re-fetching file status from FS for HFile readers #17709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
perf: Avoid re-fetching file status from FS for HFile readers #17709
Conversation
| byte[] buffer; | ||
| try (SeekableDataInputStream stream = storage.openSeekable(path, false)) { | ||
| buffer = new byte[(int) storage.getPathInfo(path).getLength()]; | ||
| buffer = new byte[(int) fileSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an inline path (inlinefs:), this represents the log block size instead of the full log file size. Will that be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a path that executes this that I can run locally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yihua let me know if there is a particular path I can step through to validate this
e541da6 to
63382d5
Compare
63382d5 to
14751d0
Compare
Describe the issue this Pull Request addresses
The HFile readers are looking up the file length when the code supplies a path to read. We often know the length for the reader upfront so we should avoid looking this up again.
Summary and Changelog
Impact
Improves performance when dealing with remote filesystems like S3/GCS
Risk Level
Low, functionality is covered by existing tests
Documentation Update
Contributor's checklist