-
Notifications
You must be signed in to change notification settings - Fork 413
fix: sanitize invalid Avro field names in manifest file #2245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@kevinjqliu When you have a moment could you review this? |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM this is great, thanks for fixing this bug!
I left a few nit comments on just general structure of the some tests.
if you like, we can also add this test for duckdb reading from pyiceberg (like the original issue)
https://github.com/kevinjqliu/iceberg-python/pull/16/files#diff-7f3dd1244d08ce27c003cd091da10aa049f7bb0c7d5397acb4ec69767036accdR1204-R1244
|
following up to #2123 (comment) and #2123 (comment) I see that spark produces |
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
|
@kevinjqliu Thanks for the feedback! In regards to the adding the spark integration test, I've been running into some issues with running integration tests locally that use Spark
Do you know what could be causing this? |
Looks like CI passed, so it could be with your local setup. I put that in chatgpt, it says what java version are you using? |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for working on this.
This is quite close. I added a comment about actually using duckdb to read. And also a nit about consolidating some of the tests so that its easier to maintain in the future
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kris-gaudel for working on this! I agree with @kevinjqliu wrt maybe consolidating some of the tests.
I have Java 22 active, but I will dry downgrading my version and see if that fixes it. Thank you for looking into it! |
|
i didnt even know there's a java 22 already, haha |
|
Ok so I downgraded my Java version and the integration tests run now, but I'm still having some issues with integration tests failing due to some spark issues. Edit: Turns out some settings with Apache Arrow I was tweaking were to blame, they pass locally now! |
|
nice! @kris-gaudel I think we're almost good to go Could you address these 2 comments when you get a chance? |
|
@kevinjqliu Done! Lmk if I missed anything 😄 |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for cleaning up and adding the additional integration tests
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
Closes apache#2123
# Rationale for this change
Fixing sanitization behaviour to match specification and Java
implementation
# Are these changes tested?
Yes - Unit and integration tests
# Are there any user-facing changes?
Yes - Field names will be sanitized to be Avro compatible if not already
<!-- In the case of user-facing changes, please add the changelog label.
-->
---------
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Closes #2123
Rationale for this change
Fixing sanitization behaviour to match specification and Java implementation
Are these changes tested?
Yes - Unit and integration tests
Are there any user-facing changes?
Yes - Field names will be sanitized to be Avro compatible if not already