Skip to content

Commit 378ebe4

Browse files
authored
Merge branch 'main' into output_schema
2 parents e9700a2 + 090ce8e commit 378ebe4

File tree

92 files changed

+2737
-1265
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+2737
-1265
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ repos:
4343
exclude: "^third_party"
4444
args: ["--check-untyped-defs", "--explicit-package-bases", "--ignore-missing-imports"]
4545
- repo: https://github.com/biomejs/pre-commit
46-
rev: v2.0.2
46+
rev: v2.2.4
4747
hooks:
4848
- id: biome-check
4949
files: '\.(js|css)$'

CHANGELOG.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,37 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7+
## [2.20.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.19.0...v2.20.0) (2025-09-16)
8+
9+
10+
### Features
11+
12+
* Add `__dataframe__` interchange support ([#2063](https://github.com/googleapis/python-bigquery-dataframes/issues/2063)) ([3b46a0d](https://github.com/googleapis/python-bigquery-dataframes/commit/3b46a0d91eb379c61ced45ae0b25339281326c3d))
13+
* Add ai_generate_bool to the bigframes.bigquery package ([#2060](https://github.com/googleapis/python-bigquery-dataframes/issues/2060)) ([70d6562](https://github.com/googleapis/python-bigquery-dataframes/commit/70d6562df64b2aef4ff0024df6f57702d52dcaf8))
14+
* Add bigframes.bigquery.to_json_string ([#2076](https://github.com/googleapis/python-bigquery-dataframes/issues/2076)) ([41e8f33](https://github.com/googleapis/python-bigquery-dataframes/commit/41e8f33ceb46a7c2a75d1c59a4a3f2f9413d281d))
15+
* Add rank(pct=True) support ([#2084](https://github.com/googleapis/python-bigquery-dataframes/issues/2084)) ([c1e871d](https://github.com/googleapis/python-bigquery-dataframes/commit/c1e871d9327bf6c920d17e1476fed3088d506f5f))
16+
* Add StreamingDataFrame.to_bigtable and .to_pubsub start_timestamp parameter ([#2066](https://github.com/googleapis/python-bigquery-dataframes/issues/2066)) ([a63cbae](https://github.com/googleapis/python-bigquery-dataframes/commit/a63cbae24ff2dc191f0a53dced885bc95f38ec96))
17+
* Can call agg with some callables ([#2055](https://github.com/googleapis/python-bigquery-dataframes/issues/2055)) ([17a1ed9](https://github.com/googleapis/python-bigquery-dataframes/commit/17a1ed99ec8c6d3215d3431848814d5d458d4ff1))
18+
* Support astype to json ([#2073](https://github.com/googleapis/python-bigquery-dataframes/issues/2073)) ([6bd6738](https://github.com/googleapis/python-bigquery-dataframes/commit/6bd67386341de7a92ada948381702430c399406e))
19+
* Support pandas.Index as key for DataFrame.__setitem__() ([#2062](https://github.com/googleapis/python-bigquery-dataframes/issues/2062)) ([b3cf824](https://github.com/googleapis/python-bigquery-dataframes/commit/b3cf8248e3b8ea76637ded64fb12028d439448d1))
20+
* Support pd.cut() for array-like type ([#2064](https://github.com/googleapis/python-bigquery-dataframes/issues/2064)) ([21eb213](https://github.com/googleapis/python-bigquery-dataframes/commit/21eb213c5f0e0f696f2d1ca1f1263678d791cf7c))
21+
* Support to cast struct to json ([#2067](https://github.com/googleapis/python-bigquery-dataframes/issues/2067)) ([b0ff718](https://github.com/googleapis/python-bigquery-dataframes/commit/b0ff718a04fadda33cfa3613b1d02822cde34bc2))
22+
23+
24+
### Bug Fixes
25+
26+
* Deflake ai_gen_bool multimodel test ([#2085](https://github.com/googleapis/python-bigquery-dataframes/issues/2085)) ([566a37a](https://github.com/googleapis/python-bigquery-dataframes/commit/566a37a30ad5677aef0c5f79bdd46bca2139cc1e))
27+
* Do not scroll page selector in anywidget `repr_mode` ([#2082](https://github.com/googleapis/python-bigquery-dataframes/issues/2082)) ([5ce5d63](https://github.com/googleapis/python-bigquery-dataframes/commit/5ce5d63fcb51bfb3df2769108b7486287896ccb9))
28+
* Fix the potential invalid VPC egress configuration ([#2068](https://github.com/googleapis/python-bigquery-dataframes/issues/2068)) ([cce4966](https://github.com/googleapis/python-bigquery-dataframes/commit/cce496605385f2ac7ab0becc0773800ed5901aa5))
29+
* Return a DataFrame containing query stats for all non-SELECT statements ([#2071](https://github.com/googleapis/python-bigquery-dataframes/issues/2071)) ([a52b913](https://github.com/googleapis/python-bigquery-dataframes/commit/a52b913d9d8794b4b959ea54744a38d9f2f174e7))
30+
* Use the remote and managed functions for bigframes results ([#2079](https://github.com/googleapis/python-bigquery-dataframes/issues/2079)) ([49b91e8](https://github.com/googleapis/python-bigquery-dataframes/commit/49b91e878de651de23649756259ee35709e3f5a8))
31+
32+
33+
### Performance Improvements
34+
35+
* Avoid re-authenticating if credentials have already been fetched ([#2058](https://github.com/googleapis/python-bigquery-dataframes/issues/2058)) ([913de1b](https://github.com/googleapis/python-bigquery-dataframes/commit/913de1b31f3bb0b306846fddae5dcaff6be3cec4))
36+
* Improve apply axis=1 performance ([#2077](https://github.com/googleapis/python-bigquery-dataframes/issues/2077)) ([12e4380](https://github.com/googleapis/python-bigquery-dataframes/commit/12e438051134577e911c1a6ce9d5a5885a0b45ad))
37+
738
## [2.19.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.18.0...v2.19.0) (2025-09-09)
839

940

bigframes/bigquery/__init__.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818

1919
import sys
2020

21+
from bigframes.bigquery._operations import ai
2122
from bigframes.bigquery._operations.approx_agg import approx_top_count
2223
from bigframes.bigquery._operations.array import (
2324
array_agg,
@@ -50,6 +51,7 @@
5051
json_value,
5152
json_value_array,
5253
parse_json,
54+
to_json_string,
5355
)
5456
from bigframes.bigquery._operations.search import create_vector_index, vector_search
5557
from bigframes.bigquery._operations.sql import sql_scalar
@@ -87,6 +89,7 @@
8789
json_value,
8890
json_value_array,
8991
parse_json,
92+
to_json_string,
9093
# search ops
9194
create_vector_index,
9295
vector_search,
@@ -96,7 +99,7 @@
9699
struct,
97100
]
98101

99-
__all__ = [f.__name__ for f in _functions]
102+
__all__ = [f.__name__ for f in _functions] + ["ai"]
100103

101104
_module = sys.modules[__name__]
102105
for f in _functions:
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
"""This module integrates BigQuery built-in AI functions for use with Series/DataFrame objects,
16+
such as AI.GENERATE_BOOL:
17+
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool"""
18+
19+
from __future__ import annotations
20+
21+
import json
22+
from typing import Any, List, Literal, Mapping, Tuple
23+
24+
from bigframes import clients, dtypes, series
25+
from bigframes.core import log_adapter
26+
from bigframes.operations import ai_ops
27+
28+
29+
@log_adapter.method_logger(custom_base_name="bigquery_ai")
30+
def generate_bool(
31+
prompt: series.Series | List[str | series.Series] | Tuple[str | series.Series, ...],
32+
*,
33+
connection_id: str | None = None,
34+
endpoint: str | None = None,
35+
request_type: Literal["dedicated", "shared", "unspecified"] = "unspecified",
36+
model_params: Mapping[Any, Any] | None = None,
37+
) -> series.Series:
38+
"""
39+
Returns the AI analysis based on the prompt, which can be any combination of text and unstructured data.
40+
41+
**Examples:**
42+
43+
>>> import bigframes.pandas as bpd
44+
>>> import bigframes.bigquery as bbq
45+
>>> bpd.options.display.progress_bar = None
46+
>>> df = bpd.DataFrame({
47+
... "col_1": ["apple", "bear", "pear"],
48+
... "col_2": ["fruit", "animal", "animal"]
49+
... })
50+
>>> bbq.ai.generate_bool((df["col_1"], " is a ", df["col_2"]))
51+
0 {'result': True, 'full_response': '{"candidate...
52+
1 {'result': True, 'full_response': '{"candidate...
53+
2 {'result': False, 'full_response': '{"candidat...
54+
dtype: struct<result: bool, full_response: string, status: string>[pyarrow]
55+
56+
>>> bbq.ai.generate_bool((df["col_1"], " is a ", df["col_2"])).struct.field("result")
57+
0 True
58+
1 True
59+
2 False
60+
Name: result, dtype: boolean
61+
62+
Args:
63+
prompt (series.Series | List[str|series.Series] | Tuple[str|series.Series, ...]):
64+
A mixture of Series and string literals that specifies the prompt to send to the model.
65+
connection_id (str, optional):
66+
Specifies the connection to use to communicate with the model. For example, `myproject.us.myconnection`.
67+
If not provided, the connection from the current session will be used.
68+
endpoint (str, optional):
69+
Specifies the Vertex AI endpoint to use for the model. For example `"gemini-2.5-flash"`. You can specify any
70+
generally available or preview Gemini model. If you specify the model name, BigQuery ML automatically identifies and
71+
uses the full endpoint of the model. If you don't specify an ENDPOINT value, BigQuery ML selects a recent stable
72+
version of Gemini to use.
73+
request_type (Literal["dedicated", "shared", "unspecified"]):
74+
Specifies the type of inference request to send to the Gemini model. The request type determines what quota the request uses.
75+
* "dedicated": function only uses Provisioned Throughput quota. The function returns the error Provisioned throughput is not
76+
purchased or is not active if Provisioned Throughput quota isn't available.
77+
* "shared": the function only uses dynamic shared quota (DSQ), even if you have purchased Provisioned Throughput quota.
78+
* "unspecified": If you haven't purchased Provisioned Throughput quota, the function uses DSQ quota.
79+
If you have purchased Provisioned Throughput quota, the function uses the Provisioned Throughput quota first.
80+
If requests exceed the Provisioned Throughput quota, the overflow traffic uses DSQ quota.
81+
model_params (Mapping[Any, Any]):
82+
Provides additional parameters to the model. The MODEL_PARAMS value must conform to the generateContent request body format.
83+
84+
Returns:
85+
bigframes.series.Series: A new struct Series with the result data. The struct contains these fields:
86+
* "result": a BOOL value containing the model's response to the prompt. The result is None if the request fails or is filtered by responsible AI.
87+
* "full_response": a STRING value containing the JSON response from the projects.locations.endpoints.generateContent call to the model.
88+
The generated text is in the text element.
89+
* "status": a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
90+
"""
91+
92+
prompt_context, series_list = _separate_context_and_series(prompt)
93+
assert len(series_list) > 0
94+
95+
operator = ai_ops.AIGenerateBool(
96+
prompt_context=tuple(prompt_context),
97+
connection_id=_resolve_connection_id(series_list[0], connection_id),
98+
endpoint=endpoint,
99+
request_type=request_type,
100+
model_params=json.dumps(model_params) if model_params else None,
101+
)
102+
103+
return series_list[0]._apply_nary_op(operator, series_list[1:])
104+
105+
106+
def _separate_context_and_series(
107+
prompt: series.Series | List[str | series.Series] | Tuple[str | series.Series, ...],
108+
) -> Tuple[List[str | None], List[series.Series]]:
109+
"""
110+
Returns the two values. The first value is the prompt with all series replaced by None. The second value is all the series
111+
in the prompt. The original item order is kept.
112+
For example:
113+
Input: ("str1", series1, "str2", "str3", series2)
114+
Output: ["str1", None, "str2", "str3", None], [series1, series2]
115+
"""
116+
if not isinstance(prompt, (list, tuple, series.Series)):
117+
raise ValueError(f"Unsupported prompt type: {type(prompt)}")
118+
119+
if isinstance(prompt, series.Series):
120+
if prompt.dtype == dtypes.OBJ_REF_DTYPE:
121+
# Multi-model support
122+
return [None], [prompt.blob.read_url()]
123+
return [None], [prompt]
124+
125+
prompt_context: List[str | None] = []
126+
series_list: List[series.Series] = []
127+
128+
for item in prompt:
129+
if isinstance(item, str):
130+
prompt_context.append(item)
131+
132+
elif isinstance(item, series.Series):
133+
prompt_context.append(None)
134+
135+
if item.dtype == dtypes.OBJ_REF_DTYPE:
136+
# Multi-model support
137+
item = item.blob.read_url()
138+
series_list.append(item)
139+
140+
else:
141+
raise TypeError(f"Unsupported type in prompt: {type(item)}")
142+
143+
if not series_list:
144+
raise ValueError("Please provide at least one Series in the prompt")
145+
146+
return prompt_context, series_list
147+
148+
149+
def _resolve_connection_id(series: series.Series, connection_id: str | None):
150+
return clients.get_canonical_bq_connection_id(
151+
connection_id or series._session._bq_connection,
152+
series._session._project,
153+
series._session._location,
154+
)

bigframes/bigquery/_operations/json.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -430,6 +430,40 @@ def json_value_array(
430430
return input._apply_unary_op(ops.JSONValueArray(json_path=json_path))
431431

432432

433+
def to_json_string(
434+
input: series.Series,
435+
) -> series.Series:
436+
"""Converts a series to a JSON-formatted STRING value.
437+
438+
**Examples:**
439+
440+
>>> import bigframes.pandas as bpd
441+
>>> import bigframes.bigquery as bbq
442+
>>> bpd.options.display.progress_bar = None
443+
444+
>>> s = bpd.Series([1, 2, 3])
445+
>>> bbq.to_json_string(s)
446+
0 1
447+
1 2
448+
2 3
449+
dtype: string
450+
451+
>>> s = bpd.Series([{"int": 1, "str": "pandas"}, {"int": 2, "str": "numpy"}])
452+
>>> bbq.to_json_string(s)
453+
0 {"int":1,"str":"pandas"}
454+
1 {"int":2,"str":"numpy"}
455+
dtype: string
456+
457+
Args:
458+
input (bigframes.series.Series):
459+
The Series to be converted.
460+
461+
Returns:
462+
bigframes.series.Series: A new Series with the JSON-formatted STRING value.
463+
"""
464+
return input._apply_unary_op(ops.ToJSONString())
465+
466+
433467
@utils.preview(name="The JSON-related API `parse_json`")
434468
def parse_json(
435469
input: series.Series,

bigframes/core/agg_expressions.py

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
from typing import Callable, Mapping, TypeVar
2323

2424
from bigframes import dtypes
25-
from bigframes.core import expression
25+
from bigframes.core import expression, window_spec
2626
import bigframes.core.identifiers as ids
2727
import bigframes.operations.aggregations as agg_ops
2828

@@ -149,3 +149,68 @@ def replace_args(
149149
self, larg: expression.Expression, rarg: expression.Expression
150150
) -> BinaryAggregation:
151151
return BinaryAggregation(self.op, larg, rarg)
152+
153+
154+
@dataclasses.dataclass(frozen=True)
155+
class WindowExpression(expression.Expression):
156+
analytic_expr: Aggregation
157+
window: window_spec.WindowSpec
158+
159+
@property
160+
def column_references(self) -> typing.Tuple[ids.ColumnId, ...]:
161+
return tuple(
162+
itertools.chain.from_iterable(
163+
map(lambda x: x.column_references, self.inputs)
164+
)
165+
)
166+
167+
@functools.cached_property
168+
def is_resolved(self) -> bool:
169+
return all(input.is_resolved for input in self.inputs)
170+
171+
@property
172+
def output_type(self) -> dtypes.ExpressionType:
173+
return self.analytic_expr.output_type
174+
175+
@property
176+
def inputs(
177+
self,
178+
) -> typing.Tuple[expression.Expression, ...]:
179+
return (self.analytic_expr, *self.window.expressions)
180+
181+
@property
182+
def free_variables(self) -> typing.Tuple[str, ...]:
183+
return tuple(
184+
itertools.chain.from_iterable(map(lambda x: x.free_variables, self.inputs))
185+
)
186+
187+
@property
188+
def is_const(self) -> bool:
189+
return all(child.is_const for child in self.inputs)
190+
191+
def transform_children(
192+
self: WindowExpression,
193+
t: Callable[[expression.Expression], expression.Expression],
194+
) -> WindowExpression:
195+
return WindowExpression(
196+
self.analytic_expr.transform_children(t),
197+
self.window.transform_exprs(t),
198+
)
199+
200+
def bind_variables(
201+
self: WindowExpression,
202+
bindings: Mapping[str, expression.Expression],
203+
allow_partial_bindings: bool = False,
204+
) -> WindowExpression:
205+
return self.transform_children(
206+
lambda x: x.bind_variables(bindings, allow_partial_bindings)
207+
)
208+
209+
def bind_refs(
210+
self: WindowExpression,
211+
bindings: Mapping[ids.ColumnId, expression.Expression],
212+
allow_partial_bindings: bool = False,
213+
) -> WindowExpression:
214+
return self.transform_children(
215+
lambda x: x.bind_refs(bindings, allow_partial_bindings)
216+
)

bigframes/core/block_transforms.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,7 @@ def rank(
417417
ascending: bool = True,
418418
grouping_cols: tuple[str, ...] = (),
419419
columns: tuple[str, ...] = (),
420+
pct: bool = False,
420421
):
421422
if method not in ["average", "min", "max", "first", "dense"]:
422423
raise ValueError(
@@ -459,6 +460,12 @@ def rank(
459460
),
460461
skip_reproject_unsafe=(col != columns[-1]),
461462
)
463+
if pct:
464+
block, max_id = block.apply_window_op(
465+
rownum_id, agg_ops.max_op, windows.unbound(grouping_keys=grouping_cols)
466+
)
467+
block, rownum_id = block.project_expr(ops.div_op.as_expr(rownum_id, max_id))
468+
462469
rownum_col_ids.append(rownum_id)
463470

464471
# Step 2: Apply aggregate to groups of like input values.

0 commit comments

Comments
 (0)