Skip to content

Commit c13737e

Browse files
authored
Merge branch 'main' into migrate-integer-label-to-datetime-op
2 parents a537ad1 + 2763b41 commit c13737e

File tree

221 files changed

+52293
-2838
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

221 files changed

+52293
-2838
lines changed

.github/ISSUE_TEMPLATE/bug_report.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,14 +29,12 @@ import bigframes
2929
import google.cloud.bigquery
3030
import pandas
3131
import pyarrow
32-
import sqlglot
3332

3433
print(f"Python: {sys.version}")
3534
print(f"bigframes=={bigframes.__version__}")
3635
print(f"google-cloud-bigquery=={google.cloud.bigquery.__version__}")
3736
print(f"pandas=={pandas.__version__}")
3837
print(f"pyarrow=={pyarrow.__version__}")
39-
print(f"sqlglot=={sqlglot.__version__}")
4038
```
4139

4240
#### Steps to reproduce

.librarian/state.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/python-librarian-generator@sha256:c8612d3fffb3f6a32353b2d1abd16b61e87811866f7ec9d65b59b02eb452a620
22
libraries:
33
- id: bigframes
4-
version: 2.30.0
4+
version: 2.32.0
55
last_generated_commit: ""
66
apis: []
77
source_roots:

.pre-commit-config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@
1414
#
1515
# See https://pre-commit.com for more information
1616
# See https://pre-commit.com/hooks.html for more hooks
17+
default_install_hook_types:
18+
- pre-commit
19+
- commit-msg
20+
1721
repos:
1822
- repo: https://github.com/pre-commit/pre-commit-hooks
1923
rev: v4.0.1
@@ -47,3 +51,9 @@ repos:
4751
hooks:
4852
- id: biome-check
4953
files: '\.(js|css)$'
54+
- repo: https://github.com/compilerla/conventional-pre-commit
55+
rev: fdde5f0251edbfc554795afdd6df71826d6602f3
56+
hooks:
57+
- id: conventional-pre-commit
58+
stages: [commit-msg]
59+
args: []

CHANGELOG.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,47 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7+
## [2.32.0](https://github.com/googleapis/google-cloud-python/compare/bigframes-v2.31.0...bigframes-v2.32.0) (2026-01-05)
8+
9+
10+
### Documentation
11+
12+
* generate sitemap.xml for better search indexing (#2351) ([7d2990f1c48c6d74e2af6bee3af87f90189a3d9b](https://github.com/googleapis/google-cloud-python/commit/7d2990f1c48c6d74e2af6bee3af87f90189a3d9b))
13+
* update supported pandas APIs documentation links (#2330) ([ea71936ce240b2becf21b552d4e41e8ef4418e2d](https://github.com/googleapis/google-cloud-python/commit/ea71936ce240b2becf21b552d4e41e8ef4418e2d))
14+
* Add time series analysis notebook (#2328) ([369f1c0aff29d197b577ec79e401b107985fe969](https://github.com/googleapis/google-cloud-python/commit/369f1c0aff29d197b577ec79e401b107985fe969))
15+
16+
17+
### Features
18+
19+
* Enable multi-column sorting in anywidget mode (#2360) ([1feb956e4762e30276e5b380c0633e6ed7881357](https://github.com/googleapis/google-cloud-python/commit/1feb956e4762e30276e5b380c0633e6ed7881357))
20+
* display series in anywidget mode (#2346) ([7395d418550058c516ad878e13567256f4300a37](https://github.com/googleapis/google-cloud-python/commit/7395d418550058c516ad878e13567256f4300a37))
21+
* Refactor TableWidget and to_pandas_batches (#2250) ([b8f09015a7c8e6987dc124e6df925d4f6951b1da](https://github.com/googleapis/google-cloud-python/commit/b8f09015a7c8e6987dc124e6df925d4f6951b1da))
22+
* Auto-plan complex reduction expressions (#2298) ([4d5de14ccdd05b1ac8f50c3fe71c35ab9e5150c1](https://github.com/googleapis/google-cloud-python/commit/4d5de14ccdd05b1ac8f50c3fe71c35ab9e5150c1))
23+
* Display custom single index column in anywidget mode (#2311) ([f27196260743883ed8131d5fd33a335e311177e4](https://github.com/googleapis/google-cloud-python/commit/f27196260743883ed8131d5fd33a335e311177e4))
24+
* add fit_predict method to ml unsupervised models (#2320) ([59df7f70a12ef702224ad61e597bd775208dac45](https://github.com/googleapis/google-cloud-python/commit/59df7f70a12ef702224ad61e597bd775208dac45))
25+
26+
27+
### Bug Fixes
28+
29+
* vendor sqlglot bigquery dialect and remove package dependency (#2354) ([b321d72d5eb005b6e9295541a002540f05f72209](https://github.com/googleapis/google-cloud-python/commit/b321d72d5eb005b6e9295541a002540f05f72209))
30+
* bigframes.ml fit with eval data in partial mode avoids join on null index (#2355) ([7171d21b8c8d5a2d61081f41fa1109b5c9c4bc5f](https://github.com/googleapis/google-cloud-python/commit/7171d21b8c8d5a2d61081f41fa1109b5c9c4bc5f))
31+
* Improve strictness of nan vs None usage (#2326) ([481d938fb0b840e17047bc4b57e61af15b976e54](https://github.com/googleapis/google-cloud-python/commit/481d938fb0b840e17047bc4b57e61af15b976e54))
32+
* Correct DataFrame widget rendering in Colab (#2319) ([7f1d3df3839ec58f52e48df088057fc0df967da9](https://github.com/googleapis/google-cloud-python/commit/7f1d3df3839ec58f52e48df088057fc0df967da9))
33+
* Fix pd.timedelta handling in polars comipler with polars 1.36 (#2325) ([252644826289d9db7a8548884de880b3a4fccafd](https://github.com/googleapis/google-cloud-python/commit/252644826289d9db7a8548884de880b3a4fccafd))
34+
35+
## [2.31.0](https://github.com/googleapis/google-cloud-python/compare/bigframes-v2.30.0...bigframes-v2.31.0) (2025-12-10)
36+
37+
38+
### Features
39+
40+
* add `bigframes.bigquery.ml` methods (#2300) ([719b278c844ca80c1bec741873b30a9ee4fd6c56](https://github.com/googleapis/google-cloud-python/commit/719b278c844ca80c1bec741873b30a9ee4fd6c56))
41+
* add 'weekday' property to DatatimeMethod (#2304) ([fafd7c732d434eca3f8b5d849a87149f106e3d5d](https://github.com/googleapis/google-cloud-python/commit/fafd7c732d434eca3f8b5d849a87149f106e3d5d))
42+
43+
44+
### Bug Fixes
45+
46+
* cache DataFrames to temp tables in bigframes.bigquery.ml methods to avoid time travel (#2318) ([d99383195ac3f1683842cfe472cca5a914b04d8e](https://github.com/googleapis/google-cloud-python/commit/d99383195ac3f1683842cfe472cca5a914b04d8e))
47+
748
## [2.30.0](https://github.com/googleapis/google-cloud-python/compare/bigframes-v2.29.0...bigframes-v2.30.0) (2025-12-03)
849

950

GEMINI.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ We use `nox` to instrument our tests.
1616
nox -r -s unit-3.13 -- -k <name of test>
1717
```
1818

19-
- To run system tests, you can execute::
19+
- Ignore this step if you lack access to Google Cloud resources. To run system
20+
tests, you can execute::
2021

2122
# Run all system tests
2223
$ nox -r -s system
@@ -26,7 +27,7 @@ We use `nox` to instrument our tests.
2627

2728
- The codebase must have better coverage than it had previously after each
2829
change. You can test coverage via `nox -s unit system cover` (takes a long
29-
time).
30+
time). Omit `system` if you lack access to cloud resources.
3031

3132
## Code Style
3233

LICENSE

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,3 +318,29 @@ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
318318
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
319319
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
320320
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
321+
322+
---
323+
324+
Files: The bigframes_vendored.sqlglot module.
325+
326+
MIT License
327+
328+
Copyright (c) 2025 Toby Mao
329+
330+
Permission is hereby granted, free of charge, to any person obtaining a copy
331+
of this software and associated documentation files (the "Software"), to deal
332+
in the Software without restriction, including without limitation the rights
333+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
334+
copies of the Software, and to permit persons to whom the Software is
335+
furnished to do so, subject to the following conditions:
336+
337+
The above copyright notice and this permission notice shall be included in all
338+
copies or substantial portions of the Software.
339+
340+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
341+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
342+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
343+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
344+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
345+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
346+
SOFTWARE.

README.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ It also contains code derived from the following third-party packages:
8282
* `Python <https://www.python.org/>`_
8383
* `scikit-learn <https://scikit-learn.org/>`_
8484
* `XGBoost <https://xgboost.readthedocs.io/en/stable/>`_
85+
* `SQLGlot <https://sqlglot.com/sqlglot.html>`_
8586

8687
For details, see the `third_party
8788
<https://github.com/googleapis/python-bigquery-dataframes/tree/main/third_party/bigframes_vendored>`_

bigframes/bigquery/_operations/ml.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ def _to_sql(df_or_sql: Union[pd.DataFrame, dataframe.DataFrame, str]) -> str:
3939
else:
4040
bf_df = cast(dataframe.DataFrame, df_or_sql)
4141

42+
# Cache dataframes to make sure base table is not a snapshot.
43+
# Cached dataframe creates a full copy, never uses snapshot.
44+
# This is a workaround for internal issue b/310266666.
45+
bf_df.cache()
4246
sql, _, _ = bf_df._to_sql_query(include_index=False)
4347
return sql
4448

bigframes/core/array_value.py

Lines changed: 83 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
from dataclasses import dataclass
1717
import datetime
1818
import functools
19-
import itertools
2019
import typing
2120
from typing import Iterable, List, Mapping, Optional, Sequence, Tuple
2221

@@ -267,21 +266,96 @@ def compute_values(self, assignments: Sequence[ex.Expression]):
267266
)
268267

269268
def compute_general_expression(self, assignments: Sequence[ex.Expression]):
269+
"""
270+
Applies arbitrary column expressions to the current execution block.
271+
272+
This method transforms the logical plan by applying a sequence of expressions that
273+
preserve the length of the input columns. It supports both scalar operations
274+
and window functions. Each expression is assigned a unique internal column identifier.
275+
276+
Args:
277+
assignments (Sequence[ex.Expression]): A sequence of expression objects
278+
representing the transformations to apply to the columns.
279+
280+
Returns:
281+
Tuple[ArrayValue, Tuple[str, ...]]: A tuple containing:
282+
- An `ArrayValue` wrapping the new root node of the updated logical plan.
283+
- A tuple of strings representing the unique column IDs generated for
284+
each expression in the assignments.
285+
"""
270286
named_exprs = [
271287
nodes.ColumnDef(expr, ids.ColumnId.unique()) for expr in assignments
272288
]
273289
# TODO: Push this to rewrite later to go from block expression to planning form
274-
# TODO: Jointly fragmentize expressions to more efficiently reuse common sub-expressions
275-
fragments = tuple(
276-
itertools.chain.from_iterable(
277-
expression_factoring.fragmentize_expression(expr)
278-
for expr in named_exprs
279-
)
280-
)
290+
new_root = expression_factoring.apply_col_exprs_to_plan(self.node, named_exprs)
291+
281292
target_ids = tuple(named_expr.id for named_expr in named_exprs)
282-
new_root = expression_factoring.push_into_tree(self.node, fragments, target_ids)
283293
return (ArrayValue(new_root), target_ids)
284294

295+
def compute_general_reduction(
296+
self,
297+
assignments: Sequence[ex.Expression],
298+
by_column_ids: typing.Sequence[str] = (),
299+
*,
300+
dropna: bool = False,
301+
):
302+
"""
303+
Applies arbitrary aggregation expressions to the block, optionally grouped by keys.
304+
305+
This method handles reduction operations (e.g., sum, mean, count) that collapse
306+
multiple input rows into a single scalar value per group. If grouping keys are
307+
provided, the operation is performed per group; otherwise, it is a global reduction.
308+
309+
Note: Intermediate aggregations (those that are inputs to further aggregations)
310+
must be windowizable. Notably excluded are approx quantile, top count ops.
311+
312+
Args:
313+
assignments (Sequence[ex.Expression]): A sequence of aggregation expressions
314+
to be calculated.
315+
by_column_ids (typing.Sequence[str], optional): A sequence of column IDs
316+
to use as grouping keys. Defaults to an empty tuple (global reduction).
317+
dropna (bool, optional): If True, rows containing null values in the
318+
`by_column_ids` columns will be filtered out before the reduction
319+
is applied. Defaults to False.
320+
321+
Returns:
322+
ArrayValue:
323+
The new root node representing the aggregation/group-by result.
324+
"""
325+
plan = self.node
326+
327+
# shortcircuit to keep things simple if all aggs are simple
328+
# TODO: Fully unify paths once rewriters are strong enough to simplify complexity from full path
329+
def _is_direct_agg(agg_expr):
330+
return isinstance(agg_expr, agg_expressions.Aggregation) and all(
331+
isinstance(child, (ex.DerefOp, ex.ScalarConstantExpression))
332+
for child in agg_expr.children
333+
)
334+
335+
if all(_is_direct_agg(agg) for agg in assignments):
336+
agg_defs = tuple((agg, ids.ColumnId.unique()) for agg in assignments)
337+
return ArrayValue(
338+
nodes.AggregateNode(
339+
child=self.node,
340+
aggregations=agg_defs, # type: ignore
341+
by_column_ids=tuple(map(ex.deref, by_column_ids)),
342+
dropna=dropna,
343+
)
344+
)
345+
346+
if dropna:
347+
for col_id in by_column_ids:
348+
plan = nodes.FilterNode(plan, ops.notnull_op.as_expr(col_id))
349+
350+
named_exprs = [
351+
nodes.ColumnDef(expr, ids.ColumnId.unique()) for expr in assignments
352+
]
353+
# TODO: Push this to rewrite later to go from block expression to planning form
354+
new_root = expression_factoring.apply_agg_exprs_to_plan(
355+
plan, named_exprs, grouping_keys=[ex.deref(by) for by in by_column_ids]
356+
)
357+
return ArrayValue(new_root)
358+
285359
def project_to_id(self, expression: ex.Expression):
286360
array_val, ids = self.compute_values(
287361
[expression],

0 commit comments

Comments
 (0)