Skip to content

Commit 7d48abd

Browse files
committed
Merge branch 'main' into shuowei-anywidget-sort-by-col-name
2 parents a04c92b + da06439 commit 7d48abd

File tree

6 files changed

+164
-40
lines changed

6 files changed

+164
-40
lines changed

README.rst

Lines changed: 18 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,25 @@ BigQuery DataFrames (BigFrames)
66
|GA| |pypi| |versions|
77

88
BigQuery DataFrames (also known as BigFrames) provides a Pythonic DataFrame
9-
and machine learning (ML) API powered by the BigQuery engine.
9+
and machine learning (ML) API powered by the BigQuery engine. It provides modules
10+
for many use cases, including:
1011

11-
* `bigframes.pandas` provides a pandas API for analytics. Many workloads can be
12+
* `bigframes.pandas <https://dataframes.bigquery.dev/reference/api/bigframes.pandas.html>`_
13+
is a pandas API for analytics. Many workloads can be
1214
migrated from pandas to bigframes by just changing a few imports.
13-
* ``bigframes.ml`` provides a scikit-learn-like API for ML.
15+
* `bigframes.ml <https://dataframes.bigquery.dev/reference/index.html#ml-apis>`_
16+
is a scikit-learn-like API for ML.
17+
* `bigframes.bigquery.ai <https://dataframes.bigquery.dev/reference/api/bigframes.bigquery.ai.html>`_
18+
are a collection of powerful AI methods, powered by Gemini.
1419

15-
BigQuery DataFrames is an open-source package.
20+
BigQuery DataFrames is an `open-source package <https://github.com/googleapis/python-bigquery-dataframes>`_.
1621

17-
**Version 2.0 introduces breaking changes for improved security and performance. See below for details.**
22+
.. |GA| image:: https://img.shields.io/badge/support-GA-gold.svg
23+
:target: https://github.com/googleapis/google-cloud-python/blob/main/README.rst#general-availability
24+
.. |pypi| image:: https://img.shields.io/pypi/v/bigframes.svg
25+
:target: https://pypi.org/project/bigframes/
26+
.. |versions| image:: https://img.shields.io/pypi/pyversions/bigframes.svg
27+
:target: https://pypi.org/project/bigframes/
1828

1929
Getting started with BigQuery DataFrames
2030
----------------------------------------
@@ -38,7 +48,8 @@ To use BigFrames in your local development environment,
3848
3949
import bigframes.pandas as bpd
4050
41-
bpd.options.bigquery.project = your_gcp_project_id
51+
bpd.options.bigquery.project = your_gcp_project_id # Optional in BQ Studio.
52+
bpd.options.bigquery.ordering_mode = "partial" # Recommended for performance.
4253
df = bpd.read_gbq("bigquery-public-data.usa_names.usa_1910_2013")
4354
print(
4455
df.groupby("name")
@@ -48,49 +59,16 @@ To use BigFrames in your local development environment,
4859
.to_pandas()
4960
)
5061
51-
5262
Documentation
5363
-------------
5464

5565
To learn more about BigQuery DataFrames, visit these pages
5666

5767
* `Introduction to BigQuery DataFrames (BigFrames) <https://cloud.google.com/bigquery/docs/bigquery-dataframes-introduction>`_
5868
* `Sample notebooks <https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks>`_
59-
* `API reference <https://cloud.google.com/python/docs/reference/bigframes/latest/summary_overview>`_
69+
* `API reference <https://dataframes.bigquery.dev/>`_
6070
* `Source code (GitHub) <https://github.com/googleapis/python-bigquery-dataframes>`_
6171

62-
⚠️ Warning: Breaking Changes in BigQuery DataFrames v2.0
63-
--------------------------------------------------------
64-
65-
Version 2.0 introduces breaking changes for improved security and performance. Key default behaviors have changed, including
66-
67-
* **Large Results (>10GB):** The default value for ``allow_large_results`` has changed to ``False``.
68-
Methods like ``to_pandas()`` will now fail if the query result's compressed data size exceeds 10GB,
69-
unless large results are explicitly permitted.
70-
* **Remote Function Security:** The library no longer automatically lets the Compute Engine default service
71-
account become the identity of the Cloud Run functions. If that is desired, it has to be indicated by passing
72-
``cloud_function_service_account="default"``. And network ingress now defaults to ``"internal-only"``.
73-
* **@remote_function Argument Passing:** Arguments other than ``input_types``, ``output_type``, and ``dataset``
74-
to ``remote_function`` must now be passed using keyword syntax, as positional arguments are no longer supported.
75-
* **@udf Argument Passing:** Arguments ``dataset`` and ``name`` to ``udf`` are now mandatory.
76-
* **Endpoint Connections:** Automatic fallback to locational endpoints in certain regions is removed.
77-
* **LLM Updates (Gemini Integration):** Integrations now default to the ``gemini-2.0-flash-001`` model.
78-
PaLM2 support has been removed; please migrate any existing PaLM2 usage to Gemini. **Note:** The current default
79-
model will be removed in Version 3.0.
80-
81-
**Important:** If you are not ready to adapt to these changes, please pin your dependency to a version less than 2.0
82-
(e.g., ``bigframes==1.42.0``) to avoid disruption.
83-
84-
To learn about these changes and how to migrate to version 2.0, see the
85-
`updated introduction guide <https://cloud.google.com/bigquery/docs/bigquery-dataframes-introduction>`_.
86-
87-
.. |GA| image:: https://img.shields.io/badge/support-GA-gold.svg
88-
:target: https://github.com/googleapis/google-cloud-python/blob/main/README.rst#general-availability
89-
.. |pypi| image:: https://img.shields.io/pypi/v/bigframes.svg
90-
:target: https://pypi.org/project/bigframes/
91-
.. |versions| image:: https://img.shields.io/pypi/pyversions/bigframes.svg
92-
:target: https://pypi.org/project/bigframes/
93-
9472
License
9573
-------
9674

bigframes/core/compile/sqlglot/aggregations/unary_compiler.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -386,6 +386,17 @@ def _(
386386
return apply_window_if_present(sge.func("MIN", column.expr), window)
387387

388388

389+
@UNARY_OP_REGISTRATION.register(agg_ops.NuniqueOp)
390+
def _(
391+
op: agg_ops.NuniqueOp,
392+
column: typed_expr.TypedExpr,
393+
window: typing.Optional[window_spec.WindowSpec] = None,
394+
) -> sge.Expression:
395+
return apply_window_if_present(
396+
sge.func("COUNT", sge.Distinct(expressions=[column.expr])), window
397+
)
398+
399+
389400
@UNARY_OP_REGISTRATION.register(agg_ops.PopVarOp)
390401
def _(
391402
op: agg_ops.PopVarOp,
@@ -400,6 +411,58 @@ def _(
400411
return apply_window_if_present(expr, window)
401412

402413

414+
@UNARY_OP_REGISTRATION.register(agg_ops.ProductOp)
415+
def _(
416+
op: agg_ops.ProductOp,
417+
column: typed_expr.TypedExpr,
418+
window: typing.Optional[window_spec.WindowSpec] = None,
419+
) -> sge.Expression:
420+
# Need to short-circuit as log with zeroes is illegal sql
421+
is_zero = sge.EQ(this=column.expr, expression=sge.convert(0))
422+
423+
# There is no product sql aggregate function, so must implement as a sum of logs, and then
424+
# apply power after. Note, log and power base must be equal! This impl uses natural log.
425+
logs = (
426+
sge.Case()
427+
.when(is_zero, sge.convert(0))
428+
.else_(sge.func("LN", sge.func("ABS", column.expr)))
429+
)
430+
logs_sum = apply_window_if_present(sge.func("SUM", logs), window)
431+
magnitude = sge.func("EXP", logs_sum)
432+
433+
# Can't determine sign from logs, so have to determine parity of count of negative inputs
434+
is_negative = (
435+
sge.Case()
436+
.when(
437+
sge.LT(this=sge.func("SIGN", column.expr), expression=sge.convert(0)),
438+
sge.convert(1),
439+
)
440+
.else_(sge.convert(0))
441+
)
442+
negative_count = apply_window_if_present(sge.func("SUM", is_negative), window)
443+
negative_count_parity = sge.Mod(
444+
this=negative_count, expression=sge.convert(2)
445+
) # 1 if result should be negative, otherwise 0
446+
447+
any_zeroes = apply_window_if_present(sge.func("LOGICAL_OR", is_zero), window)
448+
449+
float_result = (
450+
sge.Case()
451+
.when(any_zeroes, sge.convert(0))
452+
.else_(
453+
sge.Mul(
454+
this=magnitude,
455+
expression=sge.If(
456+
this=sge.EQ(this=negative_count_parity, expression=sge.convert(1)),
457+
true=sge.convert(-1),
458+
false=sge.convert(1),
459+
),
460+
)
461+
)
462+
)
463+
return float_result
464+
465+
403466
@UNARY_OP_REGISTRATION.register(agg_ops.QcutOp)
404467
def _(
405468
op: agg_ops.QcutOp,
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
WITH `bfcte_0` AS (
2+
SELECT
3+
`int64_col`
4+
FROM `bigframes-dev`.`sqlglot_test`.`scalar_types`
5+
), `bfcte_1` AS (
6+
SELECT
7+
COUNT(DISTINCT `int64_col`) AS `bfcol_1`
8+
FROM `bfcte_0`
9+
)
10+
SELECT
11+
`bfcol_1` AS `int64_col`
12+
FROM `bfcte_1`
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
WITH `bfcte_0` AS (
2+
SELECT
3+
`int64_col`
4+
FROM `bigframes-dev`.`sqlglot_test`.`scalar_types`
5+
), `bfcte_1` AS (
6+
SELECT
7+
CASE
8+
WHEN LOGICAL_OR(`int64_col` = 0)
9+
THEN 0
10+
ELSE EXP(SUM(CASE WHEN `int64_col` = 0 THEN 0 ELSE LN(ABS(`int64_col`)) END)) * IF(MOD(SUM(CASE WHEN SIGN(`int64_col`) < 0 THEN 1 ELSE 0 END), 2) = 1, -1, 1)
11+
END AS `bfcol_1`
12+
FROM `bfcte_0`
13+
)
14+
SELECT
15+
`bfcol_1` AS `int64_col`
16+
FROM `bfcte_1`
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
WITH `bfcte_0` AS (
2+
SELECT
3+
`int64_col`,
4+
`string_col`
5+
FROM `bigframes-dev`.`sqlglot_test`.`scalar_types`
6+
), `bfcte_1` AS (
7+
SELECT
8+
*,
9+
CASE
10+
WHEN LOGICAL_OR(`int64_col` = 0) OVER (PARTITION BY `string_col`)
11+
THEN 0
12+
ELSE EXP(
13+
SUM(CASE WHEN `int64_col` = 0 THEN 0 ELSE LN(ABS(`int64_col`)) END) OVER (PARTITION BY `string_col`)
14+
) * IF(
15+
MOD(
16+
SUM(CASE WHEN SIGN(`int64_col`) < 0 THEN 1 ELSE 0 END) OVER (PARTITION BY `string_col`),
17+
2
18+
) = 1,
19+
-1,
20+
1
21+
)
22+
END AS `bfcol_2`
23+
FROM `bfcte_0`
24+
)
25+
SELECT
26+
`bfcol_2` AS `agg_int64`
27+
FROM `bfcte_1`

tests/unit/core/compile/sqlglot/aggregations/test_unary_compiler.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,15 @@ def test_min(scalar_types_df: bpd.DataFrame, snapshot):
412412
snapshot.assert_match(sql_window_partition, "window_partition_out.sql")
413413

414414

415+
def test_nunique(scalar_types_df: bpd.DataFrame, snapshot):
416+
col_name = "int64_col"
417+
bf_df = scalar_types_df[[col_name]]
418+
agg_expr = agg_ops.NuniqueOp().as_expr(col_name)
419+
sql = _apply_unary_agg_ops(bf_df, [agg_expr], [col_name])
420+
421+
snapshot.assert_match(sql, "out.sql")
422+
423+
415424
def test_pop_var(scalar_types_df: bpd.DataFrame, snapshot):
416425
col_names = ["int64_col", "bool_col"]
417426
bf_df = scalar_types_df[col_names]
@@ -434,6 +443,25 @@ def test_pop_var(scalar_types_df: bpd.DataFrame, snapshot):
434443
snapshot.assert_match(sql_window, "window_out.sql")
435444

436445

446+
def test_product(scalar_types_df: bpd.DataFrame, snapshot):
447+
col_name = "int64_col"
448+
bf_df = scalar_types_df[[col_name]]
449+
agg_expr = agg_ops.ProductOp().as_expr(col_name)
450+
sql = _apply_unary_agg_ops(bf_df, [agg_expr], [col_name])
451+
452+
snapshot.assert_match(sql, "out.sql")
453+
454+
bf_df_str = scalar_types_df[[col_name, "string_col"]]
455+
window_partition = window_spec.WindowSpec(
456+
grouping_keys=(expression.deref("string_col"),),
457+
)
458+
sql_window_partition = _apply_unary_window_op(
459+
bf_df_str, agg_expr, window_partition, "agg_int64"
460+
)
461+
462+
snapshot.assert_match(sql_window_partition, "window_partition_out.sql")
463+
464+
437465
def test_qcut(scalar_types_df: bpd.DataFrame, snapshot):
438466
if sys.version_info < (3, 12):
439467
pytest.skip(

0 commit comments

Comments
 (0)