-
Notifications
You must be signed in to change notification settings - Fork 360
perf(python): optimize int64 list/tuple serialization with C++ batch processing #3216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…processing Signed-off-by: lipan02 <pandali.kk@qq.com>
…processing Signed-off-by: lipan02 <pandali.kk@qq.com>
Signed-off-by: lipan02 <pandali.kk@qq.com>
|
|
||
| // Write varint64 with ZigZag encoding inline | ||
| // Returns number of bytes written | ||
| static inline uint32_t WriteVarint64ZigZag(uint8_t *arr, int64_t value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm refactoring fory cython Buffer to forward numeric read/write into c++ Buffer, you can use c++ Buffer.write_varint64 directly after I finsihed the refactor later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WriteVarint64ZigZag should be removed now, python use same C++ Buffer for write/read ints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me take a look.
## Why? - Python Cython buffer implementation duplicated C++ buffer logic and error handling. - Centralizing buffer reads/writes in the C++ buffer reduces duplication and keeps behavior aligned. ## What does this PR do? - Refactors `python/pyfory/buffer.pyx` to delegate reads/writes, varint/tagged encoding, and bounds checks to `fory::Buffer`. - Adds C++ `Buffer` int24 helpers and exposes missing buffer APIs/errors to Cython via `libutil.pxd`. - Introduces Python error mapping helpers and updates row/collection code to pass C++ buffer pointers correctly. ## Related issues Closes #3218 #3216 #1017 ## Does this PR introduce any user-facing change? - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark
Signed-off-by: lipan02 <pandali.kk@qq.com>
Signed-off-by: lipan02 <pandali.kk@qq.com>
Signed-off-by: lipan02 <pandali.kk@qq.com>
Summary
Add C++ batch processing functions for int64 sequence serialization/deserialization to improve performance.
Changes:
Fory_PyInt64SequenceWriteToBufferandFory_PyInt64SequenceReadFromBufferin C++ layer_write_intand_read_intinCollectionSerializerfor list/tuple typesPerformance
Benchmark results on 1M integers:
Run benchmark:
Test plan