Upgrade Iceberg to 1.9.2 and Avro to 1.12.0#784
Upgrade Iceberg to 1.9.2 and Avro to 1.12.0#784the-other-tim-brown merged 17 commits intoapache:mainfrom
Conversation
|
Tracking the CI issues in this issue #787 |
The CI issue is resolved, please rebase your PR |
a4b2a8c to
afea83c
Compare
| "type": ["null", "bytes"], | ||
| "default": null |
There was a problem hiding this comment.
Let's not modify the test schema as part of this change. It seems like it is not required.
There was a problem hiding this comment.
i believe this is needed for the new parquet library version, but let me revert and see
There was a problem hiding this comment.
yep, fails with
Error: org.apache.xtable.hudi.ITHudiConversionSource.insertAndUpsertData(HoodieTableType, PartitionConfig)[3] -- Time elapsed: 10.34 s <<< ERROR!
org.apache.hudi.exception.HoodieInsertException: Failed to bulk insert for commit time 20260119224246020
at org.apache.hudi.table.action.commit.JavaBulkInsertCommitActionExecutor.execute(JavaBulkInsertCommitActionExecutor.java:63)
at org.apache.hudi.table.HoodieJavaCopyOnWriteTable.bulkInsert(HoodieJavaCopyOnWriteTable.java:118)
at org.apache.hudi.table.HoodieJavaCopyOnWriteTable.bulkInsert(HoodieJavaCopyOnWriteTable.java:85)
at org.apache.hudi.client.HoodieJavaWriteClient.bulkInsert(HoodieJavaWriteClient.java:169)
at org.apache.hudi.client.HoodieJavaWriteClient.bulkInsert(HoodieJavaWriteClient.java:158)
at org.apache.xtable.TestJavaHudiTable.insertRecordsWithCommitAlreadyStarted(TestJavaHudiTable.java:195)
at org.apache.xtable.hudi.ITHudiConversionSource.insertAndUpsertData(ITHudiConversionSource.java:245)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1062)
at java.base/java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:1688)
at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:397)
at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1004)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at org.apache.hudi.table.action.commit.JavaBulkInsertHelper.bulkInsert(JavaBulkInsertHelper.java:131)
at org.apache.hudi.table.action.commit.JavaBulkInsertHelper.bulkInsert(JavaBulkInsertHelper.java:84)
at org.apache.hudi.table.action.commit.JavaBulkInsertCommitActionExecutor.execute(JavaBulkInsertCommitActionExecutor.java:58)
... 17 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
at org.apache.hudi.execution.JavaLazyInsertIterable.computeNext(JavaLazyInsertIterable.java:71)
at org.apache.hudi.execution.JavaLazyInsertIterable.computeNext(JavaLazyInsertIterable.java:37)
at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
... 21 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:75)
at org.apache.hudi.execution.JavaLazyInsertIterable.computeNext(JavaLazyInsertIterable.java:67)
... 23 more
Caused by: org.apache.avro.AvroTypeException: Invalid default for field bytes_field: "" not a "bytes"
at org.apache.avro.Schema.validateDefault(Schema.java:1719)
at org.apache.avro.Schema$Field.<init>(Schema.java:578)
at org.apache.avro.Schema$Field.<init>(Schema.java:614)
at org.apache.hudi.avro.HoodieAvroUtils.addMetadataFields(HoodieAvroUtils.java:291)
at org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:96)
at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:89)
at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:76)
at org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:45)
at org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:85)
at org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:42)
at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:69)
... 24 more
https://github.com/apache/incubator-xtable/actions/runs/21153333020/job/60833552403?pr=784
There was a problem hiding this comment.
This looks like an avro issue, not parquet. Is the avro upgrade required?
There was a problem hiding this comment.
oops yea, i meant new avro library version. let me double check if the upgrade is necessary
There was a problem hiding this comment.
the avro upgrade is required, Iceberg 1.9.2 relies on Avro 1.12.0 APIs
and the schema changes here in basic_schema.avsc is due to compatibility with Avro 1.12.0's stricter default value handling
| * Validates that the metadata for the table is properly created/updated. {@link | ||
| * ITConversionController} validates that the table and its data can be properly read. | ||
| */ | ||
| @Execution(ExecutionMode.SAME_THREAD) |
There was a problem hiding this comment.
this is so that github CI wont run into concurrency issues, i see this is done for TestDeltaSync as well
There was a problem hiding this comment.
What is the concurrency issue? This wasn't required on the lower version of Iceberg
There was a problem hiding this comment.
I remember its about the InMemory catalog the tests are using.
| "type": ["null", "bytes"], | ||
| "default": null |
There was a problem hiding this comment.
i believe this is needed for the new parquet library version, but let me revert and see
|
CI is green @the-other-tim-brown could you take another look? i tried removing as many unrelated changes as possible |
|
@kevinjqliu why not upgrade to the latest iceberg? Can you update the PR description with why each version was chosen? |
|
Thanks for the review @the-other-tim-brown I was going to update #783 on why 1.9.2 is picked. |
|
@kevinjqliu for the hive support, is there a new import we should use to keep the version consistent? Or is it removed completely? |
|
unfortunately |
|
I added a note in the PR description on why we need the avro version upgrade. I want to keep this PR scoped to just upgrading the iceberg library version. LMK what you think. |
|
I think there are also other updates required to bump Iceberg version. Let me take a look. |
|
Thanks for taking a look @jbonofre. We'd probably want to update the LICENSE amongst other tasks. @the-other-tim-brown do you have any recommendations on how to move this forward? |
@jbonofre is there more to be done for upgrading Iceberg? |
|
@the-other-tim-brown I think we are good now in terms of dependencies alignment. I will do a pass on |
|
thanks @jbonofre! @the-other-tim-brown please take a look and let me know if if there's anything else to do to move this forward |
|
Thanks for the review @the-other-tim-brown @jbonofre |
What is the purpose of the pull request
Upgrades Iceberg and Avro libraries to newer versions to benefit from bug fixes, performance improvements, and new features.
Version Changes
Note:
iceberg-hive-runtimeis pinned to 1.7.2 as it was removed in Iceberg 1.8.0. The Hive runtime functionality has been restructured in newer Iceberg versions.Verify this pull request
This pull request is already covered by existing tests, such as (please describe tests).