-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat(schema): Phase 18 - HoodieAvroUtils removal (Part 3) #17659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(schema): Phase 18 - HoodieAvroUtils removal (Part 3) #17659
Conversation
d39bdd2 to
c393e06
Compare
|
@the-other-tim-brown I have changed the base to #17656 so that it's easier to review like you suggested :) It's ready for review! |
0325df7 to
4b72160
Compare
c393e06 to
2749bf0
Compare
4b72160 to
a2a1e0e
Compare
56f15a5 to
a6a47c6
Compare
57f86f3 to
9ebc86c
Compare
| .getFields() | ||
| .stream() | ||
| .map(f -> new FieldSchema(f.name(), f.schema().getType().getName(), f.doc())) | ||
| .map(f -> new FieldSchema(f.name(), f.schema().getType().toAvroType().getName(), f.doc())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get the name without converting to avro type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, will remove.
| ? getTableSchemaFromCommitMetadata(instantOpt.get(), includeMetadataFields) | ||
| : getTableSchemaFromLatestCommitMetadata(includeMetadataFields)) | ||
| .or(() -> | ||
| metaClient.getTableConfig().getTableCreateSchema() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how large of a change is it to modify the getTableCreateSchema to return HoodieSchema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be too big, doing it now.
|
|
||
| for (String col : colNames) { | ||
| if (avroSchema.getField(col) == null) { | ||
| if (schema.getField(col) == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this to isPresent instead of null check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| } | ||
|
|
||
| @Test | ||
| void testGetTableSchema() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this now covered by other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a compatibility test comparing between the types that return Avro.Schema and HoodieSchema, ensuring that they return the equivalent results.
Since we are changing all methods to return HoodieSchema, there is no Avro.Schema to compare against, hence test is removed as there is nothing to compare against.
| private def fetchActualSchema(): HoodieSchema = { | ||
| val tableMetaClient = createMetaClient(spark, tempBasePath) | ||
| new TableSchemaResolver(tableMetaClient).getTableSchema() | ||
| new TableSchemaResolver(tableMetaClient).getTableSchema(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to change right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, i was debugging, let me revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| .getFields() | ||
| .stream() | ||
| .map(f -> new FieldSchema(f.name(), f.schema().getType().getName(), f.doc())) | ||
| .map(f -> new FieldSchema(f.name(), f.schema().getType().toAvroType().getName(), f.doc())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get the name without converting to avro type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, removing.
9ebc86c to
2c7bd01
Compare
2c7bd01 to
314210e
Compare
| .getFields() | ||
| .stream() | ||
| .map(f -> new FieldSchema(f.name(), f.schema().getType().getName(), f.doc())) | ||
| .map(f -> new FieldSchema(f.name(), f.schema().getType().name(), f.doc())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fields that are logical types in avro, we will likely be returning their underlying data type. What impact will that have on the sync now that we would return the logical type name?
the-other-tim-brown
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@voonhous, apologies but I think we may need to keep the toAvro step for field type names to keep compatibility between releases for the meta syncs
| * Gets full schema (user + metadata) for a hoodie table. | ||
| * | ||
| * @return HoodieSchema for this table | ||
| * @return Avro schema for this table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| .getFields() | ||
| .stream() | ||
| .map(f -> new FieldSchema(f.name(), f.schema().getType().getName(), f.doc())) | ||
| .map(f -> new FieldSchema(f.name(), f.schema().getType().name(), f.doc())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here
Describe the issue this Pull Request addresses
Reference issue: #14283
Remove methods that were migrated to
HoodieSchemaUtils, consolidate remaining Avro-specific utilities, update documentation. The scope here only covers removal ofAvro.SchemafromTableSchemaResolver.NOTE: Merge this after #17656
Key Changes:
TableSchemaResolvertoHoodieSchemaand removing deprecated unused methods.Summary and Changelog
Avro.SchematoHoodieSchemafor all methods in TableSchemaResolverImpact
None
Risk Level
Low
Documentation Update
Contributor's checklist