Skip to content

Conversation

@voonhous
Copy link
Member

@voonhous voonhous commented Dec 21, 2025

Describe the issue this Pull Request addresses

Reference issue: #14283

Remove methods that were migrated to HoodieSchemaUtils, consolidate remaining Avro-specific utilities, update documentation. The scope here only covers removal of Avro.Schema from TableSchemaResolver.

NOTE: Merge this after #17656

Key Changes:

  1. Changing method signatures of TableSchemaResolver to HoodieSchema and removing deprecated unused methods.

Summary and Changelog

  1. Swap out Avro.Schema to HoodieSchema for all methods in TableSchemaResolver
  2. Cleanup tests

Impact

None

Risk Level

Low

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Dec 21, 2025
@voonhous voonhous force-pushed the phase-18-HoodieAvroUtils-removal-p2 branch 3 times, most recently from d39bdd2 to c393e06 Compare December 26, 2025 09:54
@voonhous voonhous changed the base branch from master to phase-18-HoodieAvroUtils-removal-p3 December 26, 2025 09:55
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:XL PR with lines of changes > 1000 labels Dec 26, 2025
@voonhous
Copy link
Member Author

@the-other-tim-brown I have changed the base to #17656 so that it's easier to review like you suggested :)

It's ready for review!

@voonhous voonhous force-pushed the phase-18-HoodieAvroUtils-removal-p2 branch 2 times, most recently from 0325df7 to 4b72160 Compare December 27, 2025 08:57
@voonhous voonhous force-pushed the phase-18-HoodieAvroUtils-removal-p3 branch from c393e06 to 2749bf0 Compare December 29, 2025 19:40
@github-actions github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Dec 29, 2025
@voonhous voonhous changed the base branch from phase-18-HoodieAvroUtils-removal-p3 to master January 3, 2026 07:55
@voonhous voonhous force-pushed the phase-18-HoodieAvroUtils-removal-p2 branch from 4b72160 to a2a1e0e Compare January 3, 2026 18:31
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:XL PR with lines of changes > 1000 labels Jan 3, 2026
@voonhous voonhous force-pushed the phase-18-HoodieAvroUtils-removal-p2 branch 4 times, most recently from 56f15a5 to a6a47c6 Compare January 3, 2026 18:58
@voonhous voonhous linked an issue Jan 3, 2026 that may be closed by this pull request
@voonhous voonhous force-pushed the phase-18-HoodieAvroUtils-removal-p2 branch 3 times, most recently from 57f86f3 to 9ebc86c Compare January 8, 2026 02:48
.getFields()
.stream()
.map(f -> new FieldSchema(f.name(), f.schema().getType().getName(), f.doc()))
.map(f -> new FieldSchema(f.name(), f.schema().getType().toAvroType().getName(), f.doc()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the name without converting to avro type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, will remove.

? getTableSchemaFromCommitMetadata(instantOpt.get(), includeMetadataFields)
: getTableSchemaFromLatestCommitMetadata(includeMetadataFields))
.or(() ->
metaClient.getTableConfig().getTableCreateSchema()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how large of a change is it to modify the getTableCreateSchema to return HoodieSchema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be too big, doing it now.


for (String col : colNames) {
if (avroSchema.getField(col) == null) {
if (schema.getField(col) == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this to isPresent instead of null check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

@Test
void testGetTableSchema() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now covered by other tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a compatibility test comparing between the types that return Avro.Schema and HoodieSchema, ensuring that they return the equivalent results.

Since we are changing all methods to return HoodieSchema, there is no Avro.Schema to compare against, hence test is removed as there is nothing to compare against.

private def fetchActualSchema(): HoodieSchema = {
val tableMetaClient = createMetaClient(spark, tempBasePath)
new TableSchemaResolver(tableMetaClient).getTableSchema()
new TableSchemaResolver(tableMetaClient).getTableSchema(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to change right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, i was debugging, let me revert.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.getFields()
.stream()
.map(f -> new FieldSchema(f.name(), f.schema().getType().getName(), f.doc()))
.map(f -> new FieldSchema(f.name(), f.schema().getType().toAvroType().getName(), f.doc()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the name without converting to avro type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, removing.

@voonhous voonhous force-pushed the phase-18-HoodieAvroUtils-removal-p2 branch from 9ebc86c to 2c7bd01 Compare January 8, 2026 09:06
@voonhous voonhous force-pushed the phase-18-HoodieAvroUtils-removal-p2 branch from 2c7bd01 to 314210e Compare January 8, 2026 13:18
@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 8, 2026

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

.getFields()
.stream()
.map(f -> new FieldSchema(f.name(), f.schema().getType().getName(), f.doc()))
.map(f -> new FieldSchema(f.name(), f.schema().getType().name(), f.doc()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fields that are logical types in avro, we will likely be returning their underlying data type. What impact will that have on the sync now that we would return the logical type name?

Copy link
Contributor

@the-other-tim-brown the-other-tim-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@voonhous, apologies but I think we may need to keep the toAvro step for field type names to keep compatibility between releases for the meta syncs

* Gets full schema (user + metadata) for a hoodie table.
*
* @return HoodieSchema for this table
* @return Avro schema for this table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.getFields()
.stream()
.map(f -> new FieldSchema(f.name(), f.schema().getType().getName(), f.doc()))
.map(f -> new FieldSchema(f.name(), f.schema().getType().name(), f.doc()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 18: HoodieAvroUtils Method Removal

3 participants