Skip to content

Performance concern: fillHoles() method and read buffer expansion efficiency. #599

@sherman

Description

@sherman

Hi everyone,

I have the following use case: I’m benchmarking the read throughput performance when dealing with a large number of non-dictionary string columns (300 columns). Based on the profiler output (see the attached picture), I’ve noticed that a significant amount of time is spent in the fillHoles() method, which is part of the read buffer expansion process.

My question is: why is the buffer filled one element at a time instead of using a bulk operation? Wouldn’t a batch approach be more efficient?

Looking forward to your insights. Thanks!

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions