Skip to content

Conversation

@rambleraptor
Copy link
Contributor

@rambleraptor rambleraptor commented Jun 5, 2025

Rationale for this change

This PR brings BigQuery Metastore support to Python after it was merged into the Java implementation.

This allows Iceberg catalog functionality to be backed by BigQuery. It supports creating/deleting/listing namespaces (datasets in BigQuery terminology), creating/deleting/listing tables, and registering tables.

This is my first PR of size to iceberg-python, so any advice would be appreciated!

Are these changes tested?

Integration and unit tests included.

Are there any user-facing changes?

Introduces a new Catalog type.

@rambleraptor rambleraptor marked this pull request as draft June 5, 2025 23:02
@rambleraptor rambleraptor marked this pull request as ready for review June 5, 2025 23:03
@djouallah
Copy link

@rambleraptor user here, I tried to use bigquery metastore recently when it was announced it has added rest API interface, I am just wondering, why you need to add support for pyiceberg if it is already using rest API ?

@jayceslesar
Copy link
Contributor

jayceslesar commented Jun 28, 2025

Might be wise to also try and include the changes here apache/iceberg#13052 -- via apache/iceberg#12808 (comment)

Edit: looks like that is supported via gcp.credentials-location, I see now. Maybe unify on what upstream has planned? gcp.bigquery.credential-file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are special credentials needed to run this in CI?

Copy link
Contributor Author

@rambleraptor rambleraptor Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the integration tests, yes. I'm unclear how credentials work on the current AWS test, so I could use some pointers on that. If we need real cloud resources to run these integration tests, they should be owned by Iceberg.

Unlike AWS + mock_aws, GCP doesn't have a full mock implementation that we can use.

@kevinjqliu kevinjqliu added this to the PyIceberg 0.11.0 milestone Jul 22, 2025
Copy link

@talatuyarer talatuyarer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rambleraptor I dropped few comments. Overall I test this catalog on my local env it is working as expected. LGTM. please address comments before merging this.

@kevinjqliu
Copy link
Contributor

we merged a lot of library updates this morning. Could you rebase the PR?

@rambleraptor
Copy link
Contributor Author

@kevinjqliu rebased!

@kevinjqliu
Copy link
Contributor

Thanks for the pr @rambleraptor
I have some concerns around adding a vendor-specific catalog implementation to the repo. Is it possible to reuse the rest client? or the jdbc client?
I know we already have the glue and dynamodb catalog, but if its possible, id like not add another vendor catalogs. With S3 Tables, we did not end up merging the vendor specific catalog implementation (#1404) since S3 Tables provided support for the REST interface.
Are there plans to provide the REST interface in the future?

I see that the BigQuery Metastore catalog was merged on the java side (apache/iceberg#12808).
I would like to hear other opinions from the community before proceeding.

@Fokko
Copy link
Contributor

Fokko commented Aug 19, 2025

Since the Java side already has merged, I think it makes sense to move this forward here as well. From @talatuyarer I understand that the BigQuery Metadata store doesn't have a REST endpoint (yet), in constrast to BigLake (confusing naming :)

@Fokko Fokko changed the title Added BigQuery Metastore Catalog Add BigQuery Metastore Catalog Aug 19, 2025
@rambleraptor
Copy link
Contributor Author

@Fokko @kevinjqliu yeah, that's the perfect description of the situation. Our product naming is...less than ideal.

@talatuyarer
Copy link

Thank you @Fokko and @kevinjqliu

Yes There is two independed product. BQ does not support Rest catalog. BigQuery Metastore whhich we implement in this PR, is a GA product since This January https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-metastore-fully-managed-metadata-service

@rambleraptor
Copy link
Contributor Author

I added a pytest.mark.skipif to ensure that integration tests are not run if credentials are not present.

I also filed #2368 to get the integration tests running + clean up the GCP integration tests in general.

@Fokko
Copy link
Contributor

Fokko commented Aug 21, 2025

I looks like some tests are still running:


>   ???
E   OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted, with a last message of Could not create a OAuth2 access token to authenticate the request. The request was not sent, as such an access token is required to complete the request successfully. Learn more about Google Cloud authentication at https://cloud.google.com/docs/authentication. The underlying error message was: PerformWork() - CURL error [6]=Couldn't resolve host name error_info={reason=, domain=, metadata={gcloud-cpp.retry.on-entry=false, gcloud-cpp.retry.function=GetObjectMetadata, gcloud-cpp.retry.reason=retry-policy-exhausted, gcloud-cpp.retry.original-message=Could not create a OAuth2 access token to authenticate the request. The request was not sent, as such an access token is required to complete the request successfully. Learn more about Google Cloud authentication at https://cloud.google.com/docs/authentication. The underlying error message was: PerformWork() - CURL error [6]=Couldn't resolve host name}})

pyarrow/error.pxi:92: OSError
------------------------------ Captured log setup ------------------------------
INFO     werkzeug:_internal.py:97 127.0.0.1 - - [21/Aug/2025 21:17:50] "PUT /test_bucket HTTP/1.1" 200 -
=========================== short test summary info ============================
FAILED tests/catalog/test_bigquery_metastore.py::test_create_table_with_database_location - OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted, with a last message of Could not create a OAuth2 access token to authenticate the request. The request was not sent, as such an access token is required to complete the request successfully. Learn more about Google Cloud authentication at https://cloud.google.com/docs/authentication. The underlying error message was: PerformWork() - CURL error [6]=Couldn't resolve host name error_info={reason=, domain=, metadata={gcloud-cpp.retry.on-entry=false, gcloud-cpp.retry.function=GetObjectMetadata, gcloud-cpp.retry.reason=retry-policy-exhausted, gcloud-cpp.retry.original-message=Could not create a OAuth2 access token to authenticate the request. The request was not sent, as such an access token is required to complete the request successfully. Learn more about Google Cloud authentication at https://cloud.google.com/docs/authentication. The underlying error message was: PerformWork() - CURL error [6]=Couldn't resolve host name}})
FAILED tests/catalog/test_bigquery_metastore.py::test_drop_table_with_database_location - OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted, with a last message of Could not create a OAuth2 access token to authenticate the request. The request was not sent, as such an access token is required to complete the request successfully. Learn more about Google Cloud authentication at https://cloud.google.com/docs/authentication. The underlying error message was: PerformWork() - CURL error [6]=Couldn't resolve host name error_info={reason=, domain=, metadata={gcloud-cpp.retry.on-entry=false, gcloud-cpp.retry.function=GetObjectMetadata, gcloud-cpp.retry.reason=retry-policy-exhausted, gcloud-cpp.retry.original-message=Could not create a OAuth2 access token to authenticate the request. The request was not sent, as such an access token is required to complete the request successfully. Learn more about Google Cloud authentication at https://cloud.google.com/docs/authentication. The underlying error message was: PerformWork() - CURL error [6]=Couldn't resolve host name}})

@rambleraptor
Copy link
Contributor Author

Tests should be fixed!

@Fokko Fokko merged commit 835dbe1 into apache:main Aug 26, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants