Fix CI ValueError on NumPy 2.x by avoiding VLENUTF8 string dtype in tests#853
Fix CI ValueError on NumPy 2.x by avoiding VLENUTF8 string dtype in tests#853adilraza99 wants to merge 2 commits intomalariagen:masterfrom
Conversation
|
Hi @jonbrenas @ahernank, I've opened a PR to address the CI failure in #847 by avoiding the VLENUTF8 string path in test fixtures. The fix is test-only, NumPy-version agnostic (works for 1.26.x and 2.x), and keeps production code untouched. Happy to make any adjustments if needed. Thanks! |
|
Thank you, @adilraza99. It looks like the tests still fail for the CI but the error seems to be different, which I count as progress. |
|
Hi @jonbrenas, Thanks for reviewing this! I looked into the remaining CI failures in more detail. The current failures fall into two separate categories:
To keep the change safe and reviewable, this PR is intentionally scoped to:
If it helps with validation, it might be worth temporarily allowing or bypassing the coverage job to confirm that the core test suite passes cleanly with this fix in place. I avoided bundling the Pandas/StringArray issue into this PR to keep the original regression fix minimal and isolated. Happy to open a follow-up PR focused specifically on the coverage/Pandas compatibility issue if you'd like to handle that separately. Appreciate your feedback. |
New errors that we need to investigate.
|
Thank you, @adilraza99. My bad, I forgot to check the Pandas error and I didn't recognize that it was the one showing up. I tried to rerun the tests with numpy==1.26.4, though, and it looks like a different error (due to a (0, n, m) slice, apparently) is causing tests to fail. |
|
Marking this PR as draft for now. I’m currently focusing on stabilizing CI and baseline environment in PR #855. No changes are being abandoned - just pausing to avoid conflicting CI noise. Thanks! |
This PR fixes the CI failures triggered by recent NumPy 2.x changes that were causing test initialization to crash with a ValueError.
While investigating #847, I found that several test fixtures were creating Zarr datasets using
dtype=str. With NumPy ≥ 2.0 this path goes through the VLENUTF8 codec in numcodecs, which performs boolean checks that are no longer allowed and result in the following error:ValueError: The truth value of an array with more than one element is ambiguous
What changed
In the affected test fixtures,
dtype=strhas been replaced withdtype="U"(NumPy fixed-length Unicode dtype).This keeps the stored data as strings while avoiding the VLENUTF8 code path entirely.
Why this approach
Impact
Verification
dtype=strusage exists outside testsThis should unblock the current CI failures and allow dependent PRs to proceed without introducing version constraints or behavioral changes.
Fixes #847