Skip to content

Conversation

@pandalee99
Copy link
Contributor

Summary

Add C++ batch processing functions for int64 sequence serialization/deserialization to improve performance.

Changes:

  • Add Fory_PyInt64SequenceWriteToBuffer and Fory_PyInt64SequenceReadFromBuffer in C++ layer
  • Optimize _write_int and _read_int in CollectionSerializer for list/tuple types
  • Add unit tests and benchmark script

Performance

Benchmark results on 1M integers:

Operation Before (loop) After (batch) Speedup
Serialize 22.80 ms 6.44 ms 3.5x
============================================================
Int64 Batch Serialization Benchmark
============================================================
Size         Original        Optimized       Speedup
------------------------------------------------------------
10,000       0.41            0.10            3.97x
100,000      2.28            0.68            3.38x
1,000,000    22.80           6.44            3.54x

Run benchmark:

import time

import pyfory
from pyfory.buffer import Buffer


def benchmark_original_write(data, iterations=20):
    """Benchmark original loop-based write_varint64."""
    buf = Buffer.allocate(len(data) * 9 + 1024)

    # Warmup
    for _ in range(3):
        buf.writer_index = 0
        for v in data:
            buf.write_varint64(v)

    # Benchmark
    start = time.perf_counter()
    for _ in range(iterations):
        buf.writer_index = 0
        for v in data:
            buf.write_varint64(v)
    end = time.perf_counter()

    return (end - start) / iterations * 1000


def benchmark_optimized_serialize(data, iterations=20):
    """Benchmark optimized batch serialization via Fory."""
    fory = pyfory.Fory(ref=True)

    # Warmup
    for _ in range(3):
        fory.serialize(data)

    # Benchmark
    start = time.perf_counter()
    for _ in range(iterations):
        fory.serialize(data)
    end = time.perf_counter()

    return (end - start) / iterations * 1000


def benchmark_deserialize(data, iterations=20):
    """Benchmark deserialization."""
    fory = pyfory.Fory(ref=True)
    serialized = fory.serialize(data)

    # Warmup
    for _ in range(3):
        fory.deserialize(serialized)

    # Benchmark
    start = time.perf_counter()
    for _ in range(iterations):
        fory.deserialize(serialized)
    end = time.perf_counter()

    return (end - start) / iterations * 1000, len(serialized)


def main():
    print("=" * 60)
    print("Int64 Batch Serialization Benchmark")
    print("=" * 60)
    print(f"Cython enabled: {pyfory.ENABLE_FORY_CYTHON_SERIALIZATION}")
    print()

    sizes = [10_000, 100_000, 1_000_000]

    print(f"{'Size':<12} {'Original':<15} {'Optimized':<15} {'Speedup':<10} {'Deser':<12}")
    print("-" * 60)

    for size in sizes:
        data = list(range(size))

        original_time = benchmark_original_write(data)
        optimized_time = benchmark_optimized_serialize(data)
        deser_time, serialized_size = benchmark_deserialize(data)

        speedup = original_time / optimized_time

        print(
            f"{size:<12,} {original_time:<15.2f} {optimized_time:<15.2f} "
            f"{speedup:<10.2f}x {deser_time:<12.2f}"
        )

    print()
    print("Notes:")
    print("- Original: loop with individual write_varint64 calls (write only)")
    print("- Optimized: full serialize including header overhead")
    print("- Times are in milliseconds")


if __name__ == "__main__":
    main()

Test plan

  • All existing tests pass (356 passed)
  • New unit tests for edge cases (large integers, negative values, empty list)
  • Encoding format compatible with original varint64+zigzag

…processing

Signed-off-by: lipan02 <pandali.kk@qq.com>
…processing

Signed-off-by: lipan02 <pandali.kk@qq.com>
Signed-off-by: lipan02 <pandali.kk@qq.com>

// Write varint64 with ZigZag encoding inline
// Returns number of bytes written
static inline uint32_t WriteVarint64ZigZag(uint8_t *arr, int64_t value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm refactoring fory cython Buffer to forward numeric read/write into c++ Buffer, you can use c++ Buffer.write_varint64 directly after I finsihed the refactor later

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteVarint64ZigZag should be removed now, python use same C++ Buffer for write/read ints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me take a look.

chaokunyang added a commit that referenced this pull request Jan 27, 2026
## Why?

- Python Cython buffer implementation duplicated C++ buffer logic and
error handling.
- Centralizing buffer reads/writes in the C++ buffer reduces duplication
and keeps behavior aligned.

## What does this PR do?

- Refactors `python/pyfory/buffer.pyx` to delegate reads/writes,
varint/tagged encoding, and bounds checks to `fory::Buffer`.
- Adds C++ `Buffer` int24 helpers and exposes missing buffer APIs/errors
to Cython via `libutil.pxd`.
- Introduces Python error mapping helpers and updates row/collection
code to pass C++ buffer pointers correctly.

## Related issues

Closes #3218

#3216
#1017 

## Does this PR introduce any user-facing change?



- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?

## Benchmark
pandalee99 and others added 3 commits January 27, 2026 15:25
Signed-off-by: lipan02 <pandali.kk@qq.com>
Signed-off-by: lipan02 <pandali.kk@qq.com>
Signed-off-by: lipan02 <pandali.kk@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants