Skip to content

Commit 2b6f3f5

Browse files
committed
feat: add code samples for dbt bigframes integration
1 parent 52c8233 commit 2b6f3f5

File tree

4 files changed

+152
-0
lines changed

4 files changed

+152
-0
lines changed

dbt_bigframes_integration/.dbt.yml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
dbt_sample_project:
2+
outputs:
3+
dev: # The target environment name (e.g., dev, prod)
4+
compute_region: us-central1 # Region used for compute operations
5+
dataset: dbt_sample_dateset # BigQuery dataset where dbt will create models
6+
gcs_bucket: dbt_sample_bucket # GCS bucket to store output files
7+
location: US # BigQuery dataset location
8+
method: oauth # Authentication method
9+
priority: interactive # Job priority: "interactive" or "batch"
10+
project: bigframes-dev # GCP project ID
11+
threads: 1 # Number of threads dbt can use for running models in parallel
12+
type: bigquery # Specifies the dbt adapter
13+
target: dev # The default target environment
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
2+
# Name your project! Project names should contain only lowercase characters
3+
# and underscores. A good package name should reflect your organization's
4+
# name or the intended use of these models
5+
name: 'dbt_sample_project'
6+
version: '1.0.0'
7+
8+
# This setting configures which "profile" dbt uses for this project.
9+
profile: 'dbt_sample_project'
10+
11+
# These configurations specify where dbt should look for different types of files.
12+
# The `model-paths` config, for example, states that models in this project can be
13+
# found in the "models/" directory. You probably won't need to change these!
14+
model-paths: ["models"]
15+
analysis-paths: ["analyses"]
16+
test-paths: ["tests"]
17+
seed-paths: ["seeds"]
18+
macro-paths: ["macros"]
19+
snapshot-paths: ["snapshots"]
20+
21+
clean-targets: # directories to be removed by `dbt clean`
22+
- "target"
23+
- "dbt_packages"
24+
25+
26+
# Configuring models
27+
# Full documentation: https://docs.getdbt.com/docs/configuring-models
28+
29+
# In this example config, we tell dbt to build all models in the example/
30+
# directory as views. These settings can be overridden in the individual model
31+
# files using the `{{ config(...) }}` macro.
32+
models:
33+
dbt_sample_project:
34+
# Optional: These settings (e.g., submission_method, notebook_template_id,
35+
# etc.) can also be defined directly in the Python model using dbt.config.
36+
submission_method: bigframes
37+
# Config indicated by + and applies to all files under models/example/
38+
example:
39+
+materialized: view
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# This example demonstrates one of the most general usage of transforming raw
2+
# BigQuery data into a processed table using dbt in BigFrames mode.
3+
#
4+
# Key defaults when using BigFrames in dbt:
5+
# - The default materialization is 'table' unless specified otherwise.
6+
# - The default timeout for the job is 3600 seconds (60 minutes).
7+
# - If no runtime template is provided, dbt will automatically create and reuse
8+
# a default one.
9+
#
10+
# This code sample shows a basic pattern for reading a BigQuery public dataset,
11+
# processing it using pandas-like operations, and outputting a cleaned table.
12+
13+
14+
def model(dbt, session):
15+
# Optional: override settings from dbt_project.yml. When both are set,
16+
# dbt.config takes precedence over dbt_project.yml.
17+
# Use BigFrames mode to execute the Python model.
18+
dbt.config(submission_method="bigframes")
19+
20+
# Define the BigQuery table path from which to read data.
21+
table = "bigquery-public-data.epa_historical_air_quality.temperature_hourly_summary"
22+
23+
# Define the specific columns to select from the BigQuery table.
24+
columns = ["state_name", "county_name", "date_local", "time_local", "sample_measurement"]
25+
26+
# Read data from the specified BigQuery table into a BigFrames DataFrame.
27+
# BigFrames allows you to interact with BigQuery tables using a pandas-like API.
28+
df = session.read_gbq(table, columns=columns)
29+
30+
# Sort the DataFrame by the specified columns. This prepares the data for
31+
# `drop_duplicates` to ensure consistent duplicate removal.
32+
df = df.sort_values(columns).drop_duplicates(columns)
33+
34+
# Group the DataFrame by 'state_name', 'county_name', and 'date_local'. For
35+
# each group, calculate the minimum and maximum of the 'sample_measurement'
36+
# column. The result will be a BigFrames DataFrame with a MultiIndex.
37+
result = df.groupby(["state_name", "county_name", "date_local"])["sample_measurement"]\
38+
.agg(["min", "max"])
39+
40+
# Rename some columns and convert the MultiIndex of the 'result' DataFrame
41+
# into regular columns. This flattens the DataFrame so 'state_name',
42+
# 'county_name', and 'date_local' become regular columns again.
43+
result = result.rename(columns={'min': 'min_temperature', 'max': 'max_temperature'})\
44+
.reset_index()
45+
46+
# Return the processed BigFrames DataFrame.
47+
# In a dbt Python model, this DataFrame will be materialized as a table
48+
return result
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# This example demonstrates how to build an incremental model.
2+
#
3+
# It applies lightweight, row-level logic to update or insert records into a
4+
# target BigQuery table. If the target table already exists, dbt will perform a
5+
# merge based on the specified unique keys; otherwise, it will create a new
6+
# table automatically.
7+
#
8+
# It also defines and applies a BigFrames UDF to add a descriptive summary
9+
# column based on temperature data.
10+
11+
12+
import bigframes.pandas as bpd
13+
14+
def model(dbt, session):
15+
# Optional: override settings from dbt_project.yml.
16+
# When both are set, dbt.config takes precedence over dbt_project.yml.
17+
dbt.config(
18+
# Use BigFrames mode to execute the Python model.
19+
submission_method="bigframes",
20+
# Materialize as an incremental model.
21+
materialized='incremental',
22+
# Use MERGE strategy to update rows during incremental runs.
23+
incremental_strategy='merge',
24+
# Composite key to match existing rows for updates.
25+
unique_key=["state_name", "county_name", "date_local"],
26+
)
27+
28+
# Reference an upstream dbt model or table as a DataFrame input.
29+
df = dbt.ref("dbt_bigframes_code_sample_1")
30+
31+
# Define a BigFrames UDF to generate a temperature description.
32+
@bpd.udf(dataset='dbt_sample_dataset', name='describe_udf')
33+
def describe(
34+
max_temperature: float,
35+
min_temperature: float,
36+
) -> str:
37+
is_hot = max_temperature > 85.0
38+
is_cold = min_temperature < 50.0
39+
40+
if is_hot and is_cold:
41+
return "Expect both hot and cold conditions today."
42+
if is_hot:
43+
return "Overall, it's a hot day."
44+
if is_cold:
45+
return "Overall, it's a cold day."
46+
return "Comfortable throughout the day."
47+
48+
# Apply the UDF using combine and store the result in a column "describe".
49+
df["describe"] = df["max_temperature"].combine(df["min_temperature"], describe)
50+
51+
# Return the transformed DataFrame as the final dbt model output.
52+
return df

0 commit comments

Comments
 (0)