Skip to content

Commit 98b300c

Browse files
Merge remote-tracking branch 'github/main' into polars_semi
2 parents 5449696 + 8ebfa57 commit 98b300c

File tree

223 files changed

+12753
-2546
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

223 files changed

+12753
-2546
lines changed

.kokoro/continuous/doctest.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# Only run this nox session.
44
env_vars: {
55
key: "NOX_SESSION"
6-
value: "doctest cleanup"
6+
value: "cleanup doctest"
77
}
88

99
env_vars: {

.kokoro/presubmit/doctest.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# Only run this nox session.
44
env_vars: {
55
key: "NOX_SESSION"
6-
value: "doctest cleanup"
6+
value: "cleanup doctest"
77
}
88

99
env_vars: {

CHANGELOG.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,91 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7+
## [2.6.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.5.0...v2.6.0) (2025-06-09)
8+
9+
10+
### Features
11+
12+
* Add blob.transcribe function ([#1773](https://github.com/googleapis/python-bigquery-dataframes/issues/1773)) ([86159a7](https://github.com/googleapis/python-bigquery-dataframes/commit/86159a7d24102574c26764a056478757844e2eca))
13+
* Implement ai.classify() ([#1781](https://github.com/googleapis/python-bigquery-dataframes/issues/1781)) ([8af26d0](https://github.com/googleapis/python-bigquery-dataframes/commit/8af26d07cf3e8b22e0c69dd0172352fadc1857d8))
14+
* Implement item() for Series and Index ([#1792](https://github.com/googleapis/python-bigquery-dataframes/issues/1792)) ([d2154c8](https://github.com/googleapis/python-bigquery-dataframes/commit/d2154c82fa0fed6e89c47db747d3c9cd57f9c618))
15+
* Implement ST_ISCLOSED geography function ([#1789](https://github.com/googleapis/python-bigquery-dataframes/issues/1789)) ([36bc179](https://github.com/googleapis/python-bigquery-dataframes/commit/36bc179ee7ef9b0b6799f98f8fac3f64d91412af))
16+
* Implement ST_LENGTH geography function ([#1791](https://github.com/googleapis/python-bigquery-dataframes/issues/1791)) ([c5b7fda](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425))
17+
* Support isin with bigframes.pandas.Index arg ([#1779](https://github.com/googleapis/python-bigquery-dataframes/issues/1779)) ([e480d29](https://github.com/googleapis/python-bigquery-dataframes/commit/e480d29f03636fa9824404ef90c510701e510195))
18+
19+
20+
### Bug Fixes
21+
22+
* Address `read_csv` with both `index_col` and `use_cols` behavior inconsistency with pandas ([#1785](https://github.com/googleapis/python-bigquery-dataframes/issues/1785)) ([ba7c313](https://github.com/googleapis/python-bigquery-dataframes/commit/ba7c313c8d308e3ff3f736b60978cb7a51715209))
23+
* Allow KMeans model init parameter as k-means++ alias ([#1790](https://github.com/googleapis/python-bigquery-dataframes/issues/1790)) ([0b59cf1](https://github.com/googleapis/python-bigquery-dataframes/commit/0b59cf1008613770fa1433c6da395e755c86fe22))
24+
* Replace function now can handle bpd.NA value. ([#1786](https://github.com/googleapis/python-bigquery-dataframes/issues/1786)) ([7269512](https://github.com/googleapis/python-bigquery-dataframes/commit/7269512a28eb42029447d5380c764353278a74e1))
25+
26+
27+
### Documentation
28+
29+
* Adjust strip method examples to match latest pandas ([#1797](https://github.com/googleapis/python-bigquery-dataframes/issues/1797)) ([817b0c0](https://github.com/googleapis/python-bigquery-dataframes/commit/817b0c0c5dc481598fbfdbe40fd925fb38f3a066))
30+
* Fix docstrings to improve html rendering of code examples ([#1788](https://github.com/googleapis/python-bigquery-dataframes/issues/1788)) ([38d9b73](https://github.com/googleapis/python-bigquery-dataframes/commit/38d9b7376697f8e19124e5d1f5fccda82d920b92))
31+
32+
## [2.5.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.4.0...v2.5.0) (2025-05-30)
33+
34+
35+
### ⚠ BREAKING CHANGES
36+
37+
* the updated `ai.map()` parameter list is not backward-compatible
38+
39+
### Features
40+
41+
* Add `bpd.options.bigquery.requests_transport_adapters` option ([#1755](https://github.com/googleapis/python-bigquery-dataframes/issues/1755)) ([bb45db8](https://github.com/googleapis/python-bigquery-dataframes/commit/bb45db8afdffa1417f11c050d40d4ec6d15b8654))
42+
* Add bbq.json_query and warn bbq.json_extract deprecated ([#1756](https://github.com/googleapis/python-bigquery-dataframes/issues/1756)) ([ec81dd2](https://github.com/googleapis/python-bigquery-dataframes/commit/ec81dd2228697d5bf193d86396cf7f3212e0289d))
43+
* Add bpd.options.reset() method ([#1743](https://github.com/googleapis/python-bigquery-dataframes/issues/1743)) ([36c359d](https://github.com/googleapis/python-bigquery-dataframes/commit/36c359d2521089e186a412d353daf9de6cfbc8f4))
44+
* Add DataFrame.round method ([#1742](https://github.com/googleapis/python-bigquery-dataframes/issues/1742)) ([3ea6043](https://github.com/googleapis/python-bigquery-dataframes/commit/3ea6043be7025fa7a11cca27b02f5505bbc9b129))
45+
* Add deferred data uploading ([#1720](https://github.com/googleapis/python-bigquery-dataframes/issues/1720)) ([1f6442e](https://github.com/googleapis/python-bigquery-dataframes/commit/1f6442e576c35ec784ccf9cab3d081d46e45a5ce))
46+
* Add deprecation warning to Gemini-1.5-X, text-embedding-004, and remove remove legacy models in notebooks and docs ([#1723](https://github.com/googleapis/python-bigquery-dataframes/issues/1723)) ([80aad9a](https://github.com/googleapis/python-bigquery-dataframes/commit/80aad9af794c2e06d1608c879f459a836fd4448b))
47+
* Add structured output for ai map, ai filter and ai join ([#1746](https://github.com/googleapis/python-bigquery-dataframes/issues/1746)) ([133ac6b](https://github.com/googleapis/python-bigquery-dataframes/commit/133ac6b0e1f1e7a12844a4b6fd5b26df59f7ef37))
48+
* Add support for df.loc[list, column(s)] ([#1761](https://github.com/googleapis/python-bigquery-dataframes/issues/1761)) ([768a757](https://github.com/googleapis/python-bigquery-dataframes/commit/768a7570845c4eb88f495d7f3c0f3158accdc231))
49+
* Include bq schema and query string in dry run results ([#1752](https://github.com/googleapis/python-bigquery-dataframes/issues/1752)) ([bb51147](https://github.com/googleapis/python-bigquery-dataframes/commit/bb511475b74cc253230725846098a9045be2e324))
50+
* Support `inplace=True` in `rename` and `rename_axis` ([#1744](https://github.com/googleapis/python-bigquery-dataframes/issues/1744)) ([734cc65](https://github.com/googleapis/python-bigquery-dataframes/commit/734cc652e435dc5d97a23411735aa51b7824e381))
51+
* Support `unique()` for Index ([#1750](https://github.com/googleapis/python-bigquery-dataframes/issues/1750)) ([27fac78](https://github.com/googleapis/python-bigquery-dataframes/commit/27fac78cb5654e5655aec861062837a7d4f3f679))
52+
* Support astype conversions to and from JSON dtypes ([#1716](https://github.com/googleapis/python-bigquery-dataframes/issues/1716)) ([8ef4de1](https://github.com/googleapis/python-bigquery-dataframes/commit/8ef4de10151717f88364a909b29fa7600e959ada))
53+
* Support dict param for dataframe.agg() ([#1772](https://github.com/googleapis/python-bigquery-dataframes/issues/1772)) ([f9c29c8](https://github.com/googleapis/python-bigquery-dataframes/commit/f9c29c85053d8111a74ce382490daed36f8bb35b))
54+
* Support dtype parameter in read_csv for bigquery engine ([#1749](https://github.com/googleapis/python-bigquery-dataframes/issues/1749)) ([50dca4c](https://github.com/googleapis/python-bigquery-dataframes/commit/50dca4c706d78673b03f90eccf776118247ba30b))
55+
* Use read api for some peek ops ([#1731](https://github.com/googleapis/python-bigquery-dataframes/issues/1731)) ([108f4d2](https://github.com/googleapis/python-bigquery-dataframes/commit/108f4d259e1bcfbe6c7aa3c3c3f8f605cf7615ee))
56+
57+
58+
### Bug Fixes
59+
60+
* Fix clip int series with float bounds ([#1739](https://github.com/googleapis/python-bigquery-dataframes/issues/1739)) ([d451aef](https://github.com/googleapis/python-bigquery-dataframes/commit/d451aefd2181aef250c3b48cceac09063081cab2))
61+
* Fix error with self-merge operations ([#1774](https://github.com/googleapis/python-bigquery-dataframes/issues/1774)) ([e5fe143](https://github.com/googleapis/python-bigquery-dataframes/commit/e5fe14339b4a40ab4a25657ee0453e4108cf8bba))
62+
* Fix the default value for na_value for numpy conversions ([#1766](https://github.com/googleapis/python-bigquery-dataframes/issues/1766)) ([0629cac](https://github.com/googleapis/python-bigquery-dataframes/commit/0629cac7f9a9370a72c1ae25e014eb478a4c8c08))
63+
* Include location in Session-based temporary storage manager DDL queries ([#1780](https://github.com/googleapis/python-bigquery-dataframes/issues/1780)) ([acba032](https://github.com/googleapis/python-bigquery-dataframes/commit/acba0321cafeb49f3e560a364ebbf3d15fb8af88))
64+
* Prevent creating unnecessary client objects in multithreaded environments ([#1757](https://github.com/googleapis/python-bigquery-dataframes/issues/1757)) ([1cf9f5e](https://github.com/googleapis/python-bigquery-dataframes/commit/1cf9f5e8dba733ee26d15fc5edc44c81e094e9a0))
65+
* Reduce bigquery table modification via DML for to_gbq ([#1737](https://github.com/googleapis/python-bigquery-dataframes/issues/1737)) ([545cdca](https://github.com/googleapis/python-bigquery-dataframes/commit/545cdcac1361607678c2574f0f31eb43950073e5))
66+
* Stop ignoring arguments to `MatrixFactorization.score(X, y)` ([#1726](https://github.com/googleapis/python-bigquery-dataframes/issues/1726)) ([55c07e9](https://github.com/googleapis/python-bigquery-dataframes/commit/55c07e9d4315949c37ffa3e03c8fedc6daf17faf))
67+
* Support JSON and STRUCT for bbq.sql_scalar ([#1754](https://github.com/googleapis/python-bigquery-dataframes/issues/1754)) ([190390b](https://github.com/googleapis/python-bigquery-dataframes/commit/190390b804c2131c2eaa624d7f025febb7784b01))
68+
* Support str.replace re.compile with flags ([#1736](https://github.com/googleapis/python-bigquery-dataframes/issues/1736)) ([f8d2cd2](https://github.com/googleapis/python-bigquery-dataframes/commit/f8d2cd24281415f4a8f9193b676f5483128cd173))
69+
70+
71+
### Performance Improvements
72+
73+
* Faster local data comparison using idenitity ([#1738](https://github.com/googleapis/python-bigquery-dataframes/issues/1738)) ([2858b1e](https://github.com/googleapis/python-bigquery-dataframes/commit/2858b1efb4fe74097dcb17c086ee1dc18e53053c))
74+
* Optimize repr for unordered gbq table ([#1778](https://github.com/googleapis/python-bigquery-dataframes/issues/1778)) ([2bc4fbc](https://github.com/googleapis/python-bigquery-dataframes/commit/2bc4fbc78eba4bb2ee335e0475700a7ca5bc84d7))
75+
* Use JOB_CREATION_OPTIONAL when `allow_large_results=False` ([#1763](https://github.com/googleapis/python-bigquery-dataframes/issues/1763)) ([15f3f2a](https://github.com/googleapis/python-bigquery-dataframes/commit/15f3f2aa42cfe4a2233f62c5f8906e7f7658f9fa))
76+
77+
78+
### Dependencies
79+
80+
* Avoid `gcsfs==2025.5.0` ([#1762](https://github.com/googleapis/python-bigquery-dataframes/issues/1762)) ([68d5e2c](https://github.com/googleapis/python-bigquery-dataframes/commit/68d5e2cbef3510cadc7e9dd199117c1e3b02d19f))
81+
82+
83+
### Documentation
84+
85+
* Add llm output_schema notebook ([#1732](https://github.com/googleapis/python-bigquery-dataframes/issues/1732)) ([b2261cc](https://github.com/googleapis/python-bigquery-dataframes/commit/b2261cc07cd58b51d212f9bf495c5022e587f816))
86+
* Add MatrixFactorization to the table of contents ([#1725](https://github.com/googleapis/python-bigquery-dataframes/issues/1725)) ([611e43b](https://github.com/googleapis/python-bigquery-dataframes/commit/611e43b156483848a5470f889fb7b2b473ecff4d))
87+
* Fix typo for "population" in the `GeminiTextGenerator.predict(..., output_schema={...})` sample notebook ([#1748](https://github.com/googleapis/python-bigquery-dataframes/issues/1748)) ([bd07e05](https://github.com/googleapis/python-bigquery-dataframes/commit/bd07e05d26820313c052eaf41c267a1ab20b4fc6))
88+
* Integrations notebook extracts token from `bqclient._http.credentials` instead of `bqclient._credentials` ([#1784](https://github.com/googleapis/python-bigquery-dataframes/issues/1784)) ([6e63eca](https://github.com/googleapis/python-bigquery-dataframes/commit/6e63eca29f20d83435878273604816ce7595c396))
89+
* Updated multimodal notebook instructions ([#1745](https://github.com/googleapis/python-bigquery-dataframes/issues/1745)) ([1df8ca6](https://github.com/googleapis/python-bigquery-dataframes/commit/1df8ca6312ee428d55c2091a00c73b13d9a6b193))
90+
* Use partial ordering mode in the quickstart sample ([#1734](https://github.com/googleapis/python-bigquery-dataframes/issues/1734)) ([476b7dd](https://github.com/googleapis/python-bigquery-dataframes/commit/476b7dd7c2639cb6804272d06aa5c1db666819da))
91+
792
## [2.4.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.3.0...v2.4.0) (2025-05-12)
893

994

README.rst

Lines changed: 50 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,60 @@
1-
BigQuery DataFrames
2-
===================
1+
BigQuery DataFrames (BigFrames)
2+
===============================
33

44
|GA| |pypi| |versions|
55

6-
BigQuery DataFrames provides a Pythonic DataFrame and machine learning (ML) API
7-
powered by the BigQuery engine.
6+
BigQuery DataFrames (also known as BigFrames) provides a Pythonic DataFrame
7+
and machine learning (ML) API powered by the BigQuery engine.
88

99
* ``bigframes.pandas`` provides a pandas-compatible API for analytics.
1010
* ``bigframes.ml`` provides a scikit-learn-like API for ML.
1111

12-
BigQuery DataFrames is an open-source package. You can run
13-
``pip install --upgrade bigframes`` to install the latest version.
12+
BigQuery DataFrames is an open-source package.
13+
14+
**Version 2.0 introduces breaking changes for improved security and performance. See below for details.**
15+
16+
Getting started with BigQuery DataFrames
17+
----------------------------------------
18+
19+
The easiest way to get started is to try the
20+
`BigFrames quickstart <https://cloud.google.com/bigquery/docs/dataframes-quickstart>`_
21+
in a `notebook in BigQuery Studio <https://cloud.google.com/bigquery/docs/notebooks-introduction>`_.
22+
23+
To use BigFrames in your local development environment,
24+
25+
1. Run ``pip install --upgrade bigframes`` to install the latest version.
26+
27+
2. Setup `Application default credentials <https://cloud.google.com/docs/authentication/set-up-adc-local-dev-environment>`_
28+
for your local development environment enviroment.
29+
30+
3. Create a `GCP project with the BigQuery API enabled <https://cloud.google.com/bigquery/docs/sandbox>`_.
31+
32+
4. Use the ``bigframes`` package to query data.
33+
34+
.. code-block:: python
35+
36+
import bigframes.pandas as bpd
37+
38+
bpd.options.bigquery.project = your_gcp_project_id
39+
df = bpd.read_gbq("bigquery-public-data.usa_names.usa_1910_2013")
40+
print(
41+
df.groupby("name")
42+
.agg({"number": "sum"})
43+
.sort_values("number", ascending=False)
44+
.head(10)
45+
.to_pandas()
46+
)
47+
48+
49+
Documentation
50+
-------------
51+
52+
To learn more about BigQuery DataFrames, visit these pages
53+
54+
* `Introduction to BigQuery DataFrames (BigFrames) <https://cloud.google.com/bigquery/docs/bigquery-dataframes-introduction>`_
55+
* `Sample notebooks <https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks>`_
56+
* `API reference <https://cloud.google.com/python/docs/reference/bigframes/latest/summary_overview>`_
57+
* `Source code (GitHub) <https://github.com/googleapis/python-bigquery-dataframes>`_
1458

1559
⚠️ Warning: Breaking Changes in BigQuery DataFrames v2.0
1660
--------------------------------------------------------
@@ -44,22 +88,6 @@ To learn about these changes and how to migrate to version 2.0, see the
4488
.. |versions| image:: https://img.shields.io/pypi/pyversions/bigframes.svg
4589
:target: https://pypi.org/project/bigframes/
4690

47-
Documentation
48-
-------------
49-
50-
* `BigQuery DataFrames source code (GitHub) <https://github.com/googleapis/python-bigquery-dataframes>`_
51-
* `BigQuery DataFrames sample notebooks <https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks>`_
52-
* `BigQuery DataFrames API reference <https://cloud.google.com/python/docs/reference/bigframes/latest/summary_overview>`_
53-
* `BigQuery DataFrames supported pandas APIs <https://cloud.google.com/python/docs/reference/bigframes/latest/supported_pandas_apis>`_
54-
55-
56-
Getting started with BigQuery DataFrames
57-
----------------------------------------
58-
Read `Introduction to BigQuery DataFrames <https://cloud.google.com/bigquery/docs/bigquery-dataframes-introduction>`_
59-
and try the `BigQuery DataFrames quickstart <https://cloud.google.com/bigquery/docs/dataframes-quickstart>`_
60-
to get up and running in just a few minutes.
61-
62-
6391
License
6492
-------
6593

bigframes/_config/__init__.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,12 +56,21 @@ class Options:
5656
"""Global options affecting BigQuery DataFrames behavior."""
5757

5858
def __init__(self):
59+
self.reset()
60+
61+
def reset(self) -> Options:
62+
"""Reset the option settings to defaults.
63+
64+
Returns:
65+
bigframes._config.Options: Options object with default values.
66+
"""
5967
self._local = ThreadLocalConfig()
6068

6169
# BigQuery options are special because they can only be set once per
6270
# session, so we need an indicator as to whether we are using the
6371
# thread-local session or the global session.
6472
self._bigquery_options = bigquery_options.BigQueryOptions()
73+
return self
6574

6675
def _init_bigquery_thread_local(self):
6776
"""Initialize thread-local options, based on current global options."""

bigframes/_config/bigquery_options.py

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,11 @@
1616

1717
from __future__ import annotations
1818

19-
from typing import Literal, Optional
19+
from typing import Literal, Optional, Sequence, Tuple
2020
import warnings
2121

2222
import google.auth.credentials
23+
import requests.adapters
2324

2425
import bigframes.enums
2526
import bigframes.exceptions as bfe
@@ -90,6 +91,9 @@ def __init__(
9091
allow_large_results: bool = False,
9192
ordering_mode: Literal["strict", "partial"] = "strict",
9293
client_endpoints_override: Optional[dict] = None,
94+
requests_transport_adapters: Sequence[
95+
Tuple[str, requests.adapters.BaseAdapter]
96+
] = (),
9397
enable_polars_execution: bool = False,
9498
):
9599
self._credentials = credentials
@@ -101,6 +105,7 @@ def __init__(
101105
self._kms_key_name = kms_key_name
102106
self._skip_bq_connection_check = skip_bq_connection_check
103107
self._allow_large_results = allow_large_results
108+
self._requests_transport_adapters = requests_transport_adapters
104109
self._session_started = False
105110
# Determines the ordering strictness for the session.
106111
self._ordering_mode = _validate_ordering_mode(ordering_mode)
@@ -382,6 +387,46 @@ def client_endpoints_override(self, value: dict):
382387

383388
self._client_endpoints_override = value
384389

390+
@property
391+
def requests_transport_adapters(
392+
self,
393+
) -> Sequence[Tuple[str, requests.adapters.BaseAdapter]]:
394+
"""Transport adapters for requests-based REST clients such as the
395+
google-cloud-bigquery package.
396+
397+
For more details, see the explanation in `requests guide to transport
398+
adapters
399+
<https://requests.readthedocs.io/en/latest/user/advanced/#transport-adapters>`_.
400+
401+
**Examples:**
402+
403+
Increase the connection pool size using the requests `HTTPAdapter
404+
<https://requests.readthedocs.io/en/latest/api/#requests.adapters.HTTPAdapter>`_.
405+
406+
>>> import bigframes.pandas as bpd
407+
>>> bpd.options.bigquery.requests_transport_adapters = (
408+
... ("http://", requests.adapters.HTTPAdapter(pool_maxsize=100)),
409+
... ("https://", requests.adapters.HTTPAdapter(pool_maxsize=100)),
410+
... ) # doctest: +SKIP
411+
412+
Returns:
413+
Sequence[Tuple[str, requests.adapters.BaseAdapter]]:
414+
Prefixes and corresponding transport adapters to `mount
415+
<https://requests.readthedocs.io/en/latest/api/#requests.Session.mount>`_
416+
in requests-based REST clients.
417+
"""
418+
return self._requests_transport_adapters
419+
420+
@requests_transport_adapters.setter
421+
def requests_transport_adapters(
422+
self, value: Sequence[Tuple[str, requests.adapters.BaseAdapter]]
423+
) -> None:
424+
if self._session_started and self._requests_transport_adapters != value:
425+
raise ValueError(
426+
SESSION_STARTED_MESSAGE.format(attribute="requests_transport_adapters")
427+
)
428+
self._requests_transport_adapters = value
429+
385430
@property
386431
def enable_polars_execution(self) -> bool:
387432
"""If True, will use polars to execute some simple query plans locally."""

0 commit comments

Comments
 (0)