Skip to content

Commit c1a804e

Browse files
Fix ML fit ordering issue with partial mode and eval data.
Modified `bigframes.ml.utils.combine_training_and_evaluation_data` to: 1. Join training `X` and `y` into a single DataFrame (and similarly for eval data) before concatenation. This ensures row identity/alignment is preserved through the concat operation, resolving issues where separate concats could drift apart in `ordering_mode="partial"`. 2. Operate on copies of input DataFrames to avoid side-effects (mutating user's input). 3. Safely handle column name collisions between `X` and `y` by temporarily renaming `y` columns during the join/merge process. Updated `tests/system/large/ml/test_linear_model.py`: - Parameterized `test_linear_regression_configure_fit_with_eval_score` to run with both `penguins_df_default_index` and `penguins_df_null_index` fixtures. - This ensures the fix is robust against different index configurations (default sequential vs potential null/arbitrary indices). This change fixes a bug where providing validation data to `fit()` could fail or produce incorrect results when using partial ordering mode.
1 parent fa446ae commit c1a804e

File tree

1 file changed

+12
-3
lines changed

1 file changed

+12
-3
lines changed

tests/system/large/ml/test_linear_model.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
# limitations under the License.
1414

1515
import pandas as pd
16+
import pytest
1617

1718
from bigframes.ml import model_selection
1819
import bigframes.ml.linear_model
@@ -61,12 +62,20 @@ def test_linear_regression_configure_fit_score(penguins_df_default_index, datase
6162
assert reloaded_model.tol == 0.01
6263

6364

65+
@pytest.mark.parametrize(
66+
"df_fixture",
67+
[
68+
"penguins_df_default_index",
69+
"penguins_df_null_index",
70+
],
71+
)
6472
def test_linear_regression_configure_fit_with_eval_score(
65-
penguins_df_default_index, dataset_id
73+
df_fixture, dataset_id, request
6674
):
75+
df = request.getfixturevalue(df_fixture)
6776
model = bigframes.ml.linear_model.LinearRegression()
6877

69-
df = penguins_df_default_index.dropna()
78+
df = df.dropna()
7079
X = df[
7180
[
7281
"species",
@@ -109,7 +118,7 @@ def test_linear_regression_configure_fit_with_eval_score(
109118
assert reloaded_model.tol == 0.01
110119

111120
# make sure the bqml model was internally created with custom split
112-
bq_model = penguins_df_default_index._session.bqclient.get_model(bq_model_name)
121+
bq_model = df._session.bqclient.get_model(bq_model_name)
113122
last_fitting = bq_model.training_runs[-1]["trainingOptions"]
114123
assert last_fitting["dataSplitMethod"] == "CUSTOM"
115124
assert "dataSplitColumn" in last_fitting

0 commit comments

Comments
 (0)