fix(parquet): bss encoding and tests on big endian systems#663
Conversation
- Add platform-specific decodeByteStreamSplitBatchWidth{4,8}InByteOrder
for little-endian and s390x big-endian architectures.
- Update ByteStreamSplitDecoder to use new endianness-aware decoding
functions for correct behavior on all platforms.
…systems Fix TestPageIndexRoundTripSuite and TestEncoding tests on big-endian systems
zeroshade
left a comment
There was a problem hiding this comment.
just one nitpick, otherwise this looks good!
|
@zeroshade I was thinking of maybe adding a CI run on a BE architecture. I'm using docker with QEMU emulation of a s390x machine to run this locally, so I'm sure it's possible in a GitHub action as well. The problem is that there are still couple of dozens tests failing, I've only fixed those in the parquet/encoding. So either we make it run just in the parquet/encoding package for now, or maybe add a job that doesn't run automatically, but can be run manually to see the status. |
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| //go:build armbe || arm64be || m68k || mips || mips64 || mips64p32 || ppc || ppc64 || s390 || s390x || shbe || sparc || sparc64 |
There was a problem hiding this comment.
I've copied these from https://golang.bg/src/encoding/binary/native_endian_big.go
zeroshade
left a comment
There was a problem hiding this comment.
Let's add the CI with qemu as a separate PR. This is good to go as is! thanks much!
Rationale for this change
To ensure the Arrow and Parquet Go libraries work correctly on big-endian architectures.
What changes are included in this PR?
Added endianness-aware BYTE_STREAM_SPLIT decoding in the parquet/encoding package.
Fixed tests in the parquet package to handle byte order correctly on big-endian systems.
Are these changes tested?
Yes, all affected unit tests now pass on both little-endian and big-endian machines. The changes specifically address some of the previously failing tests on big-endian systems.
Are there any user-facing changes?
No user-facing API changes. The changes are internal and ensure correct behavior on supported architectures.