Skip to content

Commit 5af2972

Browse files
committed
BUG: Fix multi-index on columns with bool level values does not roundtrip through parquet
1 parent 8a286fa commit 5af2972

File tree

3 files changed

+29
-0
lines changed

3 files changed

+29
-0
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -709,6 +709,7 @@ I/O
709709
- Bug in :meth:`read_stata` where the missing code for double was not recognised for format versions 105 and prior (:issue:`58149`)
710710
- Bug in :meth:`set_option` where setting the pandas option ``display.html.use_mathjax`` to ``False`` has no effect (:issue:`59884`)
711711
- Bug in :meth:`to_excel` where :class:`MultiIndex` columns would be merged to a single row when ``merge_cells=False`` is passed (:issue:`60274`)
712+
- Bug in :meth:`read_parquet` raising ``ValueError`` if the multi-index contains a level with bools and if that multi-index is on the columns, then while the parquet can be written with the ``pyarrow`` engine, it cannot be read back in using ``pyarrow``. (:issue:`60508`)
712713

713714
Period
714715
^^^^^^

pandas/core/dtypes/astype.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,13 @@ def _astype_nansafe(
125125
)
126126
raise ValueError(msg)
127127

128+
if arr.dtype == object and dtype == bool:
129+
# If the dtype is bool and the array is object, we need to replace the False and True of the object type in the ndarray with the bool type
130+
# to ensure that the type conversion is correct
131+
arr[arr == "False"] = np.False_
132+
arr[arr == "True"] = np.True_
133+
return arr.astype(dtype, copy=copy)
134+
128135
if copy or arr.dtype == object or dtype == object:
129136
# Explicit copy, or required since NumPy can't view from / to object.
130137
return arr.astype(dtype, copy=True)

pandas/tests/io/test_parquet.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1468,3 +1468,24 @@ def test_invalid_dtype_backend(self, engine):
14681468
df.to_parquet(path)
14691469
with pytest.raises(ValueError, match=msg):
14701470
read_parquet(path, dtype_backend="numpy")
1471+
1472+
def test_bool_multiIndex(self, tmp_path, pa):
1473+
# GH 60508
1474+
df = pd.DataFrame(
1475+
[
1476+
[1, 2],
1477+
[4, 5],
1478+
],
1479+
columns=pd.MultiIndex.from_tuples(
1480+
[
1481+
(True, 'B'),
1482+
(False, 'C'),
1483+
]
1484+
)
1485+
)
1486+
df.to_parquet(
1487+
path=tmp_path,
1488+
engine=pa,
1489+
)
1490+
result = pd.read_parquet(tmp_path, engine=pa)
1491+
tm.assert_frame_equal(result, df)

0 commit comments

Comments
 (0)