Commit c1a804e
committed
Fix ML fit ordering issue with partial mode and eval data.
Modified `bigframes.ml.utils.combine_training_and_evaluation_data` to:
1. Join training `X` and `y` into a single DataFrame (and similarly for eval data) before concatenation. This ensures row identity/alignment is preserved through the concat operation, resolving issues where separate concats could drift apart in `ordering_mode="partial"`.
2. Operate on copies of input DataFrames to avoid side-effects (mutating user's input).
3. Safely handle column name collisions between `X` and `y` by temporarily renaming `y` columns during the join/merge process.
Updated `tests/system/large/ml/test_linear_model.py`:
- Parameterized `test_linear_regression_configure_fit_with_eval_score` to run with both `penguins_df_default_index` and `penguins_df_null_index` fixtures.
- This ensures the fix is robust against different index configurations (default sequential vs potential null/arbitrary indices).
This change fixes a bug where providing validation data to `fit()` could fail or produce incorrect results when using partial ordering mode.1 parent fa446ae commit c1a804e
1 file changed
+12
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
64 | 72 | | |
65 | | - | |
| 73 | + | |
66 | 74 | | |
| 75 | + | |
67 | 76 | | |
68 | 77 | | |
69 | | - | |
| 78 | + | |
70 | 79 | | |
71 | 80 | | |
72 | 81 | | |
| |||
109 | 118 | | |
110 | 119 | | |
111 | 120 | | |
112 | | - | |
| 121 | + | |
113 | 122 | | |
114 | 123 | | |
115 | 124 | | |
| |||
0 commit comments