Skip to content

Conversation

@gabeiglio
Copy link
Contributor

Rationale for this change

Following with the Java solution implementation on how to read partition specs when a source field was dropped.

Are these changes tested?

Yes, added one integration tests, and one unit test

Are there any user-facing changes?

No

@gabeiglio gabeiglio marked this pull request as draft August 21, 2025 17:19
@gabeiglio gabeiglio marked this pull request as ready for review August 21, 2025 17:40
source_field = schema.find_field(field.source_id)
result_type = field.transform.result_type(source_field.field_type)
nested_fields.append(NestedField(field.field_id, field.name, result_type, required=source_field.required))
else:
Copy link
Contributor Author

@gabeiglio gabeiglio Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to get some opinions for just allowing this for VoidTransforms fields, as as of now we can drop columns without dropping the partition first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunally, dropping is not an option. The V1 tables do not have field-IDs encoded in the struct, and is purely positional based. See the spec for details. Dropping a field, would change the position, potentially resulting in data integrity issues.

Copy link
Contributor Author

@gabeiglio gabeiglio Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko I was referring more to that if the field's transform is not VoidTransform then we should potentially fail here. Java does not allow dropping a schema column if there is an non-void partition referencing that field, but here it is not guaranteed that if there is no source-field for a partition then the partition will be void.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko gentle pin as this might gone off radar :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, if I understand the concern correctly. The worry is that we're allowing unknown type for non void transforms when the source field is missing.

On the Java side, when we're reading partition specs from the metadata, we are using the allow missing fields equal to true, which skips the validation if the source is missing(here).

This validation is only hit when we're constructing new partition specs. When we're reading, it's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I misunderstood the question, but nothing is being dropped here, so that all looks good.

@gabeiglio gabeiglio force-pushed the read-dropped-fields branch from d93868c to 65b14f9 Compare December 4, 2025 14:06
@gabeiglio
Copy link
Contributor Author

@geruh @Fokko thank you for the comments! I have addressed them in this new commit.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gabeiglio for fixing this, and thanks @geruh for the review 🙌

@Fokko Fokko merged commit 8ed913b into apache:main Dec 4, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants