Skip to content

Commit 991a462

Browse files
authored
chore: update db benchmark config/add notebook load testing session/update readme (#1041)
* chore: update db benchmark config * update project
1 parent af6aa9c commit 991a462

File tree

47 files changed

+337
-124
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+337
-124
lines changed

.kokoro/continuous/notebook.cfg

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,6 @@ env_vars: {
66
value: "notebook"
77
}
88

9-
env_vars: {
10-
key: "BENCHMARK_AND_PUBLISH"
11-
value: "true"
12-
}
13-
149
env_vars: {
1510
key: "GOOGLE_CLOUD_PROJECT"
1611
value: "bigframes-testing"

.kokoro/load/notebook.cfg

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Format: //devtools/kokoro/config/proto/build.proto
2+
3+
# Only run this nox session.
4+
env_vars: {
5+
key: "NOX_SESSION"
6+
value: "notebook"
7+
}
8+
9+
env_vars: {
10+
key: "BENCHMARK_AND_PUBLISH"
11+
value: "true"
12+
}
13+
14+
env_vars: {
15+
key: "GOOGLE_CLOUD_PROJECT"
16+
value: "bigframes-testing"
17+
}

tests/benchmark/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,28 @@ This section lists the benchmarks currently available, with descriptions and lin
1111
- **TPC-H Benchmark**: Based on the TPC-H standards, this benchmark evaluates transaction processing capabilities. It is adapted from code found in the Polars repository, specifically tailored to test and compare these capabilities. Details are available on the [Polars Benchmark GitHub repository](https://github.com/pola-rs/polars-benchmark).
1212
- **Notebooks**: These Jupyter notebooks showcase BigFrames' key features and patterns, and also enable performance benchmarking. Explore them at the [BigFrames Notebooks repository](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks).
1313

14+
## Benchmark Configuration Using `config.jsonl` Files
15+
16+
For each benchmark, a corresponding `config.jsonl` file exists in the same folder or its parent folder. These configuration files allow users to control various benchmark parameters without modifying the code directly. By updating the relevant `config.jsonl` file in the specific benchmark's folder, you can easily configure settings such as:
17+
- **benchmark_suffix**: A suffix appended to the benchmark name for identification purposes.
18+
- **ordered**: Controls the mode for BigFrames, specifying whether to use ordered (`true`) or unordered mode (`false`).
19+
- **project_id**: The Google Cloud project ID where the benchmark dataset or table is located.
20+
- **dataset_id**: The dataset ID for querying during the benchmark.
21+
- **table_id**: This is **required** for benchmarks like `dbbenchmark` that target a specific table, but is **not configurable** for benchmarks like `TPC-H`, which use multiple tables with fixed names.
22+
23+
### Example `config.jsonl` Files
24+
25+
#### `dbbenchmark` Example
26+
```jsonl
27+
{"benchmark_suffix": "50g_ordered", "project_id": "your-google-cloud-project", "dataset_id": "dbbenchmark", "table_id": "G1_1e9_1e2_5_0", "ordered": true}
28+
{"benchmark_suffix": "50g_unordered", "project_id": "your-google-cloud-project", "dataset_id": "dbbenchmark", "table_id": "G1_1e9_1e2_5_0", "ordered": false}
29+
```
30+
31+
#### `TPC-H` Example
32+
```jsonl
33+
{"benchmark_suffix": "10t_unordered", "project_id": "your-google-cloud-project", "dataset_id": "tpch_0010t", "ordered": false}
34+
```
35+
1436
## Usage Examples
1537
Our benchmarking process runs internally on a daily basis to continuously monitor the performance of BigFrames. However, there are occasions when you might need to conduct benchmarking locally to test specific changes or new features.
1638

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
{"benchmark_suffix": "50g_ordered", "table_id": "G1_1e9_1e2_5_0", "ordered": true}
2-
{"benchmark_suffix": "50g_unordered", "table_id": "G1_1e9_1e2_5_0", "ordered": false}
1+
{"benchmark_suffix": "50g_ordered", "project_id": "bigframes-dev-perf", "dataset_id": "dbbenchmark", "table_id": "G1_1e9_1e2_5_0", "ordered": true}
2+
{"benchmark_suffix": "50g_unordered", "project_id": "bigframes-dev-perf", "dataset_id": "dbbenchmark", "table_id": "G1_1e9_1e2_5_0", "ordered": false}

tests/benchmark/db_benchmark/groupby/q1.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,21 @@
1818
import bigframes_vendored.db_benchmark.groupby_queries as vendored_dbbenchmark_groupby_queries
1919

2020
if __name__ == "__main__":
21-
table_id, session, suffix = utils.get_dbbenchmark_configuration()
21+
(
22+
project_id,
23+
dataset_id,
24+
table_id,
25+
session,
26+
suffix,
27+
) = utils.get_configuration(include_table_id=True)
2228
current_path = pathlib.Path(__file__).absolute()
2329

2430
utils.get_execution_time(
25-
vendored_dbbenchmark_groupby_queries.q1, current_path, suffix, table_id, session
31+
vendored_dbbenchmark_groupby_queries.q1,
32+
current_path,
33+
suffix,
34+
project_id,
35+
dataset_id,
36+
table_id,
37+
session,
2638
)

tests/benchmark/db_benchmark/groupby/q10.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,21 @@
1818
import bigframes_vendored.db_benchmark.groupby_queries as vendored_dbbenchmark_groupby_queries
1919

2020
if __name__ == "__main__":
21-
table_id, session, suffix = utils.get_dbbenchmark_configuration()
21+
(
22+
project_id,
23+
dataset_id,
24+
table_id,
25+
session,
26+
suffix,
27+
) = utils.get_configuration(include_table_id=True)
2228
current_path = pathlib.Path(__file__).absolute()
2329

2430
utils.get_execution_time(
2531
vendored_dbbenchmark_groupby_queries.q10,
2632
current_path,
2733
suffix,
34+
project_id,
35+
dataset_id,
2836
table_id,
2937
session,
3038
)

tests/benchmark/db_benchmark/groupby/q2.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,21 @@
1818
import bigframes_vendored.db_benchmark.groupby_queries as vendored_dbbenchmark_groupby_queries
1919

2020
if __name__ == "__main__":
21-
table_id, session, suffix = utils.get_dbbenchmark_configuration()
21+
(
22+
project_id,
23+
dataset_id,
24+
table_id,
25+
session,
26+
suffix,
27+
) = utils.get_configuration(include_table_id=True)
2228
current_path = pathlib.Path(__file__).absolute()
2329

2430
utils.get_execution_time(
25-
vendored_dbbenchmark_groupby_queries.q2, current_path, suffix, table_id, session
31+
vendored_dbbenchmark_groupby_queries.q2,
32+
current_path,
33+
suffix,
34+
project_id,
35+
dataset_id,
36+
table_id,
37+
session,
2638
)

tests/benchmark/db_benchmark/groupby/q3.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,21 @@
1818
import bigframes_vendored.db_benchmark.groupby_queries as vendored_dbbenchmark_groupby_queries
1919

2020
if __name__ == "__main__":
21-
table_id, session, suffix = utils.get_dbbenchmark_configuration()
21+
(
22+
project_id,
23+
dataset_id,
24+
table_id,
25+
session,
26+
suffix,
27+
) = utils.get_configuration(include_table_id=True)
2228
current_path = pathlib.Path(__file__).absolute()
2329

2430
utils.get_execution_time(
25-
vendored_dbbenchmark_groupby_queries.q3, current_path, suffix, table_id, session
31+
vendored_dbbenchmark_groupby_queries.q3,
32+
current_path,
33+
suffix,
34+
project_id,
35+
dataset_id,
36+
table_id,
37+
session,
2638
)

tests/benchmark/db_benchmark/groupby/q4.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,21 @@
1818
import bigframes_vendored.db_benchmark.groupby_queries as vendored_dbbenchmark_groupby_queries
1919

2020
if __name__ == "__main__":
21-
table_id, session, suffix = utils.get_dbbenchmark_configuration()
21+
(
22+
project_id,
23+
dataset_id,
24+
table_id,
25+
session,
26+
suffix,
27+
) = utils.get_configuration(include_table_id=True)
2228
current_path = pathlib.Path(__file__).absolute()
2329

2430
utils.get_execution_time(
25-
vendored_dbbenchmark_groupby_queries.q4, current_path, suffix, table_id, session
31+
vendored_dbbenchmark_groupby_queries.q4,
32+
current_path,
33+
suffix,
34+
project_id,
35+
dataset_id,
36+
table_id,
37+
session,
2638
)

tests/benchmark/db_benchmark/groupby/q5.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,21 @@
1818
import bigframes_vendored.db_benchmark.groupby_queries as vendored_dbbenchmark_groupby_queries
1919

2020
if __name__ == "__main__":
21-
table_id, session, suffix = utils.get_dbbenchmark_configuration()
21+
(
22+
project_id,
23+
dataset_id,
24+
table_id,
25+
session,
26+
suffix,
27+
) = utils.get_configuration(include_table_id=True)
2228
current_path = pathlib.Path(__file__).absolute()
2329

2430
utils.get_execution_time(
25-
vendored_dbbenchmark_groupby_queries.q5, current_path, suffix, table_id, session
31+
vendored_dbbenchmark_groupby_queries.q5,
32+
current_path,
33+
suffix,
34+
project_id,
35+
dataset_id,
36+
table_id,
37+
session,
2638
)

0 commit comments

Comments
 (0)