Skip to content

Commit 92870fd

Browse files
committed
Merge remote-tracking branch 'origin/main' into refactor-isnull-op
2 parents 30e1eb6 + dba2a6e commit 92870fd

File tree

195 files changed

+10120
-1268
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

195 files changed

+10120
-1268
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,6 @@ repos:
3939
rev: v1.15.0
4040
hooks:
4141
- id: mypy
42-
additional_dependencies: [types-requests, types-tabulate, pandas-stubs<=2.2.3.241126]
42+
additional_dependencies: [types-requests, types-tabulate, types-PyYAML, pandas-stubs<=2.2.3.241126]
4343
exclude: "^third_party"
4444
args: ["--check-untyped-defs", "--explicit-package-bases", "--ignore-missing-imports"]

CHANGELOG.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,82 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7+
## [2.9.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.8.0...v2.9.0) (2025-06-30)
8+
9+
10+
### Features
11+
12+
* Add `bpd.read_arrow` to convert an Arrow object into a bigframes DataFrame ([#1855](https://github.com/googleapis/python-bigquery-dataframes/issues/1855)) ([633bf98](https://github.com/googleapis/python-bigquery-dataframes/commit/633bf98fde33264be4fc9d7454e541c560589152))
13+
* Add experimental polars execution ([#1747](https://github.com/googleapis/python-bigquery-dataframes/issues/1747)) ([daf0c3b](https://github.com/googleapis/python-bigquery-dataframes/commit/daf0c3b349fb1e85e7070c54a2d3f5460f5e40c9))
14+
* Add size op support in local engine ([#1865](https://github.com/googleapis/python-bigquery-dataframes/issues/1865)) ([942e66c](https://github.com/googleapis/python-bigquery-dataframes/commit/942e66c483c9afbb680a7af56c9e9a76172a33e1))
15+
* Create `deploy_remote_function` and `deploy_udf` functions to immediately deploy functions to BigQuery ([#1832](https://github.com/googleapis/python-bigquery-dataframes/issues/1832)) ([c706759](https://github.com/googleapis/python-bigquery-dataframes/commit/c706759b85359b6d23ce3449f6ab138ad2d22f9d))
16+
* Support index item assign in Series ([#1868](https://github.com/googleapis/python-bigquery-dataframes/issues/1868)) ([c5d251a](https://github.com/googleapis/python-bigquery-dataframes/commit/c5d251a1d454bb4ef55ea9905faeadd646a23b14))
17+
* Support item assignment in series ([#1859](https://github.com/googleapis/python-bigquery-dataframes/issues/1859)) ([25684ff](https://github.com/googleapis/python-bigquery-dataframes/commit/25684ff60367f49dd318d4677a7438abdc98bff9))
18+
* Support local execution of comparison ops ([#1849](https://github.com/googleapis/python-bigquery-dataframes/issues/1849)) ([1c45ccb](https://github.com/googleapis/python-bigquery-dataframes/commit/1c45ccb133091aa85bc34450704fc8cab3d9296b))
19+
20+
21+
### Bug Fixes
22+
23+
* Fix bug selecting column repeatedly ([#1858](https://github.com/googleapis/python-bigquery-dataframes/issues/1858)) ([cc339e9](https://github.com/googleapis/python-bigquery-dataframes/commit/cc339e9938129cac896460e3a794b3ec8479fa4a))
24+
* Fix bug with DataFrame.agg for string values ([#1870](https://github.com/googleapis/python-bigquery-dataframes/issues/1870)) ([81e4d64](https://github.com/googleapis/python-bigquery-dataframes/commit/81e4d64c5a3bd8d30edaf909d0bef2d1d1a51c01))
25+
* Generate GoogleSQL instead of legacy SQL data types for `dry_run=True` from `bpd._read_gbq_colab` with local pandas DataFrame ([#1867](https://github.com/googleapis/python-bigquery-dataframes/issues/1867)) ([fab3c38](https://github.com/googleapis/python-bigquery-dataframes/commit/fab3c387b2ad66043244fa813a366e613b41c60f))
26+
* Revert dict back to protobuf in the iam binding update ([#1838](https://github.com/googleapis/python-bigquery-dataframes/issues/1838)) ([9fb3cb4](https://github.com/googleapis/python-bigquery-dataframes/commit/9fb3cb444607df6736d383a2807059bca470c453))
27+
28+
29+
### Documentation
30+
31+
* Add data visualization samples for public doc ([#1847](https://github.com/googleapis/python-bigquery-dataframes/issues/1847)) ([15e1277](https://github.com/googleapis/python-bigquery-dataframes/commit/15e1277b1413de18a5e36f72959a99701d6df08b))
32+
* Changed broken logo ([#1866](https://github.com/googleapis/python-bigquery-dataframes/issues/1866)) ([e3c06b4](https://github.com/googleapis/python-bigquery-dataframes/commit/e3c06b4a07d0669a42460d081f1582b681ae3dd5))
33+
* Update ai.forecast notebook ([#1844](https://github.com/googleapis/python-bigquery-dataframes/issues/1844)) ([1863538](https://github.com/googleapis/python-bigquery-dataframes/commit/186353888db537b561ee994256f998df361b4071))
34+
35+
## [2.8.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.7.0...v2.8.0) (2025-06-23)
36+
37+
38+
### ⚠ BREAKING CHANGES
39+
40+
* add required param 'engine' to multimodal functions ([#1834](https://github.com/googleapis/python-bigquery-dataframes/issues/1834))
41+
42+
### Features
43+
44+
* Add `bpd.options.compute.maximum_result_rows` option to limit client data download ([#1829](https://github.com/googleapis/python-bigquery-dataframes/issues/1829)) ([e22a3f6](https://github.com/googleapis/python-bigquery-dataframes/commit/e22a3f61a02cc1b7a5155556e5a07a1a2fea1d82))
45+
* Add `bpd.options.display.repr_mode = "anywidget"` to create an interactive display of the results ([#1820](https://github.com/googleapis/python-bigquery-dataframes/issues/1820)) ([be0a3cf](https://github.com/googleapis/python-bigquery-dataframes/commit/be0a3cf7711dadc68d8366ea90b99855773e2a2e))
46+
* Add DataFrame.ai.forecast() support ([#1828](https://github.com/googleapis/python-bigquery-dataframes/issues/1828)) ([7bc7f36](https://github.com/googleapis/python-bigquery-dataframes/commit/7bc7f36fc20d233f4cf5ed688cc5dcaf100ce4fb))
47+
* Add describe() method to Series ([#1827](https://github.com/googleapis/python-bigquery-dataframes/issues/1827)) ([a4205f8](https://github.com/googleapis/python-bigquery-dataframes/commit/a4205f882012820c034cb15d73b2768ec4ad3ac8))
48+
* Add required param 'engine' to multimodal functions ([#1834](https://github.com/googleapis/python-bigquery-dataframes/issues/1834)) ([37666e4](https://github.com/googleapis/python-bigquery-dataframes/commit/37666e4c137d52c28ab13477dfbcc6e92b913334))
49+
50+
51+
### Performance Improvements
52+
53+
* Produce simpler sql ([#1836](https://github.com/googleapis/python-bigquery-dataframes/issues/1836)) ([cf9c22a](https://github.com/googleapis/python-bigquery-dataframes/commit/cf9c22a09c4e668a598fa1dad0f6a07b59bc6524))
54+
55+
56+
### Documentation
57+
58+
* Add ai.forecast notebook ([#1840](https://github.com/googleapis/python-bigquery-dataframes/issues/1840)) ([2430497](https://github.com/googleapis/python-bigquery-dataframes/commit/24304972fdbdfd12c25c7f4ef5a7b280f334801a))
59+
60+
## [2.7.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.6.0...v2.7.0) (2025-06-16)
61+
62+
63+
### Features
64+
65+
* Add bbq.json_query_array and warn bbq.json_extract_array deprecated ([#1811](https://github.com/googleapis/python-bigquery-dataframes/issues/1811)) ([dc9eb27](https://github.com/googleapis/python-bigquery-dataframes/commit/dc9eb27fa75e90c2c95a0619551bf67aea6ef63b))
66+
* Add bbq.json_value_array and deprecate bbq.json_extract_string_array ([#1818](https://github.com/googleapis/python-bigquery-dataframes/issues/1818)) ([019051e](https://github.com/googleapis/python-bigquery-dataframes/commit/019051e453d81769891aa398475ebd04d1826e81))
67+
* Add groupby cumcount ([#1798](https://github.com/googleapis/python-bigquery-dataframes/issues/1798)) ([18f43e8](https://github.com/googleapis/python-bigquery-dataframes/commit/18f43e8b58e03a27b021bce07566a3d006ac3679))
68+
* Support custom build service account in `remote_function` ([#1796](https://github.com/googleapis/python-bigquery-dataframes/issues/1796)) ([e586151](https://github.com/googleapis/python-bigquery-dataframes/commit/e586151df81917b49f702ae496aaacbd02931636))
69+
70+
71+
### Bug Fixes
72+
73+
* Correct read_csv behaviours with use_cols, names, index_col ([#1804](https://github.com/googleapis/python-bigquery-dataframes/issues/1804)) ([855031a](https://github.com/googleapis/python-bigquery-dataframes/commit/855031a316a6957731a5d1c5e59dedb9757d9f7a))
74+
* Fix single row broadcast with null index ([#1803](https://github.com/googleapis/python-bigquery-dataframes/issues/1803)) ([080eb7b](https://github.com/googleapis/python-bigquery-dataframes/commit/080eb7be3cde591e08cad0d5c52c68cc0b25ade8))
75+
76+
77+
### Documentation
78+
79+
* Document how to use ai.map() for information extraction ([#1808](https://github.com/googleapis/python-bigquery-dataframes/issues/1808)) ([b586746](https://github.com/googleapis/python-bigquery-dataframes/commit/b5867464a5bf30300dcfc069eda546b11f03146c))
80+
* Rearrange README.rst to include a short code sample ([#1812](https://github.com/googleapis/python-bigquery-dataframes/issues/1812)) ([f6265db](https://github.com/googleapis/python-bigquery-dataframes/commit/f6265dbb8e22de81bb59c7def175cd325e85c041))
81+
* Use pandas API instead of pandas-like or pandas-compatible ([#1825](https://github.com/googleapis/python-bigquery-dataframes/issues/1825)) ([aa32369](https://github.com/googleapis/python-bigquery-dataframes/commit/aa323694e161f558bc5e60490c2f21008961e2ca))
82+
783
## [2.6.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.5.0...v2.6.0) (2025-06-09)
884

985

README.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ BigQuery DataFrames (BigFrames)
66
BigQuery DataFrames (also known as BigFrames) provides a Pythonic DataFrame
77
and machine learning (ML) API powered by the BigQuery engine.
88

9-
* ``bigframes.pandas`` provides a pandas-compatible API for analytics.
9+
* `bigframes.pandas` provides a pandas API for analytics. Many workloads can be
10+
migrated from pandas to bigframes by just changing a few imports.
1011
* ``bigframes.ml`` provides a scikit-learn-like API for ML.
1112

1213
BigQuery DataFrames is an open-source package.

bigframes/_config/bigquery_options.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
import google.auth.credentials
2323
import requests.adapters
2424

25+
import bigframes._importing
2526
import bigframes.enums
2627
import bigframes.exceptions as bfe
2728

@@ -94,6 +95,7 @@ def __init__(
9495
requests_transport_adapters: Sequence[
9596
Tuple[str, requests.adapters.BaseAdapter]
9697
] = (),
98+
enable_polars_execution: bool = False,
9799
):
98100
self._credentials = credentials
99101
self._project = project
@@ -113,6 +115,9 @@ def __init__(
113115
client_endpoints_override = {}
114116

115117
self._client_endpoints_override = client_endpoints_override
118+
if enable_polars_execution:
119+
bigframes._importing.import_polars()
120+
self._enable_polars_execution = enable_polars_execution
116121

117122
@property
118123
def application_name(self) -> Optional[str]:
@@ -424,3 +429,22 @@ def requests_transport_adapters(
424429
SESSION_STARTED_MESSAGE.format(attribute="requests_transport_adapters")
425430
)
426431
self._requests_transport_adapters = value
432+
433+
@property
434+
def enable_polars_execution(self) -> bool:
435+
"""If True, will use polars to execute some simple query plans locally."""
436+
return self._enable_polars_execution
437+
438+
@enable_polars_execution.setter
439+
def enable_polars_execution(self, value: bool):
440+
if self._session_started and self._enable_polars_execution != value:
441+
raise ValueError(
442+
SESSION_STARTED_MESSAGE.format(attribute="enable_polars_execution")
443+
)
444+
if value is True:
445+
msg = bfe.format_message(
446+
"Polars execution is an experimental feature, and may not be stable. Must have polars installed."
447+
)
448+
warnings.warn(msg, category=bfe.PreviewWarning)
449+
bigframes._importing.import_polars()
450+
self._enable_polars_execution = value

bigframes/_config/compute_options.py

Lines changed: 39 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -55,29 +55,7 @@ class ComputeOptions:
5555
{'test2': 'abc', 'test3': False}
5656
5757
Attributes:
58-
maximum_bytes_billed (int, Options):
59-
Limits the bytes billed for query jobs. Queries that will have
60-
bytes billed beyond this limit will fail (without incurring a
61-
charge). If unspecified, this will be set to your project default.
62-
See `maximum_bytes_billed`: https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJobConfig#google_cloud_bigquery_job_QueryJobConfig_maximum_bytes_billed.
63-
64-
enable_multi_query_execution (bool, Options):
65-
If enabled, large queries may be factored into multiple smaller queries
66-
in order to avoid generating queries that are too complex for the query
67-
engine to handle. However this comes at the cost of increase cost and latency.
68-
69-
extra_query_labels (Dict[str, Any], Options):
70-
Stores additional custom labels for query configuration.
71-
72-
semantic_ops_confirmation_threshold (int, optional):
73-
.. deprecated:: 1.42.0
74-
Semantic operators are deprecated. Please use AI operators instead
75-
76-
semantic_ops_threshold_autofail (bool):
77-
.. deprecated:: 1.42.0
78-
Semantic operators are deprecated. Please use AI operators instead
79-
80-
ai_ops_confirmation_threshold (int, optional):
58+
ai_ops_confirmation_threshold (int | None):
8159
Guards against unexpected processing of large amount of rows by semantic operators.
8260
If the number of rows exceeds the threshold, the user will be asked to confirm
8361
their operations to resume. The default value is 0. Set the value to None
@@ -87,26 +65,57 @@ class ComputeOptions:
8765
Guards against unexpected processing of large amount of rows by semantic operators.
8866
When set to True, the operation automatically fails without asking for user inputs.
8967
90-
allow_large_results (bool):
68+
allow_large_results (bool | None):
9169
Specifies whether query results can exceed 10 GB. Defaults to False. Setting this
9270
to False (the default) restricts results to 10 GB for potentially faster execution;
9371
BigQuery will raise an error if this limit is exceeded. Setting to True removes
9472
this result size limit.
73+
74+
enable_multi_query_execution (bool | None):
75+
If enabled, large queries may be factored into multiple smaller queries
76+
in order to avoid generating queries that are too complex for the query
77+
engine to handle. However this comes at the cost of increase cost and latency.
78+
79+
extra_query_labels (Dict[str, Any] | None):
80+
Stores additional custom labels for query configuration.
81+
82+
maximum_bytes_billed (int | None):
83+
Limits the bytes billed for query jobs. Queries that will have
84+
bytes billed beyond this limit will fail (without incurring a
85+
charge). If unspecified, this will be set to your project default.
86+
See `maximum_bytes_billed`: https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJobConfig#google_cloud_bigquery_job_QueryJobConfig_maximum_bytes_billed.
87+
88+
maximum_result_rows (int | None):
89+
Limits the number of rows in an execution result. When converting
90+
a BigQuery DataFrames object to a pandas DataFrame or Series (e.g.,
91+
using ``.to_pandas()``, ``.peek()``, ``.__repr__()``, direct
92+
iteration), the data is downloaded from BigQuery to the client
93+
machine. This option restricts the number of rows that can be
94+
downloaded. If the number of rows to be downloaded exceeds this
95+
limit, a ``bigframes.exceptions.MaximumResultRowsExceeded``
96+
exception is raised.
97+
98+
semantic_ops_confirmation_threshold (int | None):
99+
.. deprecated:: 1.42.0
100+
Semantic operators are deprecated. Please use AI operators instead
101+
102+
semantic_ops_threshold_autofail (bool):
103+
.. deprecated:: 1.42.0
104+
Semantic operators are deprecated. Please use AI operators instead
95105
"""
96106

97-
maximum_bytes_billed: Optional[int] = None
107+
ai_ops_confirmation_threshold: Optional[int] = 0
108+
ai_ops_threshold_autofail: bool = False
109+
allow_large_results: Optional[bool] = None
98110
enable_multi_query_execution: bool = False
99111
extra_query_labels: Dict[str, Any] = dataclasses.field(
100112
default_factory=dict, init=False
101113
)
114+
maximum_bytes_billed: Optional[int] = None
115+
maximum_result_rows: Optional[int] = None
102116
semantic_ops_confirmation_threshold: Optional[int] = 0
103117
semantic_ops_threshold_autofail = False
104118

105-
ai_ops_confirmation_threshold: Optional[int] = 0
106-
ai_ops_threshold_autofail: bool = False
107-
108-
allow_large_results: Optional[bool] = None
109-
110119
def assign_extra_query_labels(self, **kwargs: Any) -> None:
111120
"""
112121
Assigns additional custom labels for query configuration. The method updates the

bigframes/_config/display_options.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ class DisplayOptions:
2929
max_columns: int = 20
3030
max_rows: int = 25
3131
progress_bar: Optional[str] = "auto"
32-
repr_mode: Literal["head", "deferred"] = "head"
32+
repr_mode: Literal["head", "deferred", "anywidget"] = "head"
3333

3434
max_info_columns: int = 100
3535
max_info_rows: Optional[int] = 200000

bigframes/_importing.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
import importlib
15+
from types import ModuleType
16+
17+
from packaging import version
18+
19+
# Keep this in sync with setup.py
20+
POLARS_MIN_VERSION = version.Version("1.7.0")
21+
22+
23+
def import_polars() -> ModuleType:
24+
polars_module = importlib.import_module("polars")
25+
imported_version = version.Version(polars_module.build_info()["version"])
26+
if imported_version < POLARS_MIN_VERSION:
27+
raise ImportError(
28+
f"Imported polars version: {imported_version} is below the minimum version: {POLARS_MIN_VERSION}"
29+
)
30+
return polars_module

bigframes/bigquery/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
json_query_array,
4444
json_set,
4545
json_value,
46+
json_value_array,
4647
parse_json,
4748
)
4849
from bigframes.bigquery._operations.search import create_vector_index, vector_search
@@ -71,6 +72,7 @@
7172
"json_query_array",
7273
"json_set",
7374
"json_value",
75+
"json_value_array",
7476
"parse_json",
7577
# search ops
7678
"create_vector_index",

0 commit comments

Comments
 (0)