-
Notifications
You must be signed in to change notification settings - Fork 3k
GCP: Add Google Authentication Support #13212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @nastra, when you have time Could you review this ? |
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adutra Thank you for reviewing
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
…-specific in this manner.
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/test/java/org/apache/iceberg/gcp/auth/TestGoogleAuthSession.java
Outdated
Show resolved
Hide resolved
gcp/src/test/java/org/apache/iceberg/gcp/auth/TestGoogleAuthSession.java
Outdated
Show resolved
Hide resolved
| */ | ||
| public static final int GCS_DELETE_BATCH_SIZE_DEFAULT = 50; | ||
|
|
||
| public static final String GCP_CREDENTIALS_PATH_PROPERTY = "gcp.auth.credentials-path"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm currently not sure whether these properties should be here vs in GoogleAuthManager. The properties in this class are typically related to the FileIO implementation of GCS, whereas these new properties aren't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved these configurations to GoogleAuthManager. May be we should rename GCPProperties.java as GCSProperties.java ?
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/test/java/org/apache/iceberg/gcp/auth/TestGoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/test/java/org/apache/iceberg/gcp/auth/TestGoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/test/java/org/apache/iceberg/gcp/auth/TestGoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/test/java/org/apache/iceberg/gcp/auth/TestGoogleAuthSession.java
Outdated
Show resolved
Hide resolved
gcp/src/test/java/org/apache/iceberg/gcp/auth/TestGoogleAuthSession.java
Outdated
Show resolved
Hide resolved
gcp/src/test/java/org/apache/iceberg/gcp/auth/TestGoogleAuthSession.java
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthSession.java
Outdated
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adutra @nastra - we are helping review a similar PR in PyIceberg, and I wanted to bring this discussion here first.
With the AuthManager, we now have a pluggable way of introducing authentication mechanisms to Iceberg clients. Given that's the case, do we want to encourage vendors to implement and manager their AuthManager separately from this project, or encourage them to contribute and have those code implementations maintained in the parent repositry?
Dremio seems to have taken the former approach: https://github.com/dremio/iceberg-auth-manager
Whereas there's precedent for us supporting AWS SigV4 authentication on the project itself: https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/RESTSigV4AuthManager.java
Would this warrant a mailing list discussion?
|
@sungwy thanks for raising the question!
Once you have a pluggable API, you cannot prevent users from implementing it :-) IOW, there will necessarily be vendor-specific implementations living outside of Iceberg repo eventually. But any implementation that is generic enough to be useful to most Iceberg users deserves being integrated into Iceberg. I think we need to strike a balance between the two possibilities and decide on a case-by-case basis. For example, this PR does introduce a vendor-specific Auth Manager, since it can only work when running on Google's infra. But since it's fairly generic, I don't see any problem in having this Auth Manager in Iceberg.
That is not true: Dremio open-sourced this project recently, but it does not contain anything Dremio-specific. It is a greenfield project that aims at providing an alternate implementation of the Auth Manager API for OAuth, offering support for many more features than the ones supported by the built-in OAuth manager. It is actually Dremio's intention to donate this project, but indeed that conversation didn't officially start yet.
Yes, definitely! I will start this discussion shortly. |
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java
Outdated
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthSession.java
Outdated
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthSession.java
Outdated
Show resolved
Hide resolved
gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthSession.java
Outdated
Show resolved
Hide resolved
| return newHeaders.equals(request.headers()) | ||
| ? request | ||
| : ImmutableHTTPRequest.builder().from(request).headers(newHeaders).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we need this? I don't think we reuse requests and if we did, the putIfAbsent would resolve it. Seems like unnecessary complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep this because we re-use headers. https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L256
If someone add Authorizationheader it can broke workflow. I dont want to allow that behaviour.
danielcweeks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @talatuyarer
## Summary - Upgrade to iceberg 1.10.0 to grab column [stats](apache/iceberg#10659), and some CVE's: [CVE](apache/iceberg#13561) (and parquet, avro transitively), and [BigQueryMetastoreCatalog](apache/iceberg#12808), [Google Auth](apache/iceberg#13212). - Column stats is the key feature here - we rely on extracting the puffin files and grabbing stats metadata. <img width="1342" height="461" alt="Screenshot 2025-09-20 at 4 30 35 PM" src="https://github.com/user-attachments/assets/bc8eeb80-6ff7-4abe-8ffb-a0eebf48bc4e" /> ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Per-partition, per-column statistics extraction with optional persistence to a new data-quality metrics KV store; platform APIs can produce a metrics-specific KV store. * **Breaking Changes** * Extraction API signatures and summary/key formats changed; thrift summary shapes updated; config token renamed "groupbys" → "group_bys"; /api/summary-series now returns null. * **Refactor** * Large-scale test package reorganizations and import consolidations across the codebase. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to track the status of stacks when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> --------- Co-authored-by: thomaschow <thomaschow369@gmail.com>
* Initial Version of GCPAuthManager
This PR introduces a dedicated GoogleAuthManager for the Iceberg REST client, enabling robust and streamlined authentication with Google Cloud Platform (GCP) services by delegating authentication to GCP's auth library.
Why add a specific GoogleAuthManager?
Currently, Iceberg's AuthManager and the generic OAuth2Manager provide a good baseline for authentication. However, connecting to services on Google Cloud often uses specific patterns that a generic manager doesn't fully simplify. This new GoogleAuthManager makes life easier for folks using Iceberg REST catalogs behind GCP Auth Service
The existing OAuth2Manager is great for standard OAuth2 flows (like client credentials) but it does not support authorization code workflow.
The GoogleAuthManager introduces authentication methods that are more convenient for GCP users, differing from the generic OAuth2 flow:
While our current OAuth2Manager can be used with tokens from service accounts, it doesn't natively understand refresh token workflow and the ADC discovery process or how to directly consume a service account JSON file for its credentials.
This dedicated manager ensures that Iceberg can authenticate with GCP-backed REST catalog services in the most efficient, secure, and user-friendly manner, aligning with common GCP practices. The GoogleAuthManager bridges this gap for a more seamless Iceberg Rest experience on GCP.