-
Notifications
You must be signed in to change notification settings - Fork 412
Set the ManifestEntryStatus #2408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A faulty refactor: apache@59742e0#diff-e9df836e263024ba54e2706853fb25c00269fbfe3726b440ba57f4a929c995dcR927 This produces incorrect metadata, but was hidden by the writer.
3f04eb9 to
1d1a68d
Compare
|
nit: do we want to add a test for this? |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
status=ManifestEntryStatus.ADDED was previously added in _wrap_append. We missed this line when replacing _wrap_append, everywhere else looks good.
|
Thanks for the suggestion @stevie9868 of adding a test, and after a second look, I believe the code is in a dead branch. Looks like we exclusively write new data with inheritance enabled, so the branch is actually never hit 👍 I've refactored the code a bit by pruning the dead branch. |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Did a manual check by running V1: {"status":1,"snapshot_id":3550151653852953585,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-fc7ceb8e-f7da-4932-be94-5157b701ec4e.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}
{"status":0,"snapshot_id":4327648111228143457,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-5a3f45d7-3963-4e61-bb86-348120b264c2.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}
{"status":0,"snapshot_id":5751151500317885948,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-906a39db-b725-4ee3-a223-8522be204d67.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}
{"status":0,"snapshot_id":2322795646469314858,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-c96d54d7-25e7-4a57-b3fd-b206ce551c68.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}
{"status":0,"snapshot_id":776868969565186897,"data_file":{"file_path":"file:///private/var/folders/q_/gj3w0fts15n9l1dz58630jg40000gp/T/pytest-of-fokko.driesprong/pytest-48/test_sql0/default/test_merge_manifest/data/00000-0-6d2aa946-dafa-4cd1-ba43-9092082c78ac.parquet","file_format":"PARQUET","partition":{},"record_count":3,"file_size_in_bytes":4406,"block_size_in_bytes":67108864,"column_sizes":{"array":[{"key":1,"value":49},{"key":2,"value":78},{"key":3,"value":128},{"key":4,"value":94},{"key":5,"value":118},{"key":6,"value":94},{"key":7,"value":118},{"key":8,"value":118},{"key":9,"value":118},{"key":10,"value":94},{"key":11,"value":78},{"key":12,"value":109}]},"value_counts":{"array":[{"key":1,"value":3},{"key":2,"value":3},{"key":3,"value":3},{"key":4,"value":3},{"key":5,"value":3},{"key":6,"value":3},{"key":7,"value":3},{"key":8,"value":3},{"key":9,"value":3},{"key":10,"value":3},{"key":11,"value":3},{"key":12,"value":3}]},"null_value_counts":{"array":[{"key":1,"value":1},{"key":2,"value":1},{"key":3,"value":1},{"key":4,"value":1},{"key":5,"value":1},{"key":6,"value":1},{"key":7,"value":1},{"key":8,"value":1},{"key":9,"value":1},{"key":10,"value":1},{"key":11,"value":1},{"key":12,"value":1}]},"nan_value_counts":{"array":[]},"lower_bounds":{"array":[{"key":1,"value":"\u0000"},{"key":2,"value":"a"},{"key":3,"value":"aaaaaaaaaaaaaaaa"},{"key":4,"value":"\u0001\u0000\u0000\u0000"},{"key":5,"value":"\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"\u0000\u0000\u0000�"},{"key":7,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000�"},{"key":8,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":9,"value":"\u0000�jÊ8ñ\u0005\u0000"},{"key":10,"value":"�K\u0000\u0000"},{"key":11,"value":"\u0001"},{"key":12,"value":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}]},"upper_bounds":{"array":[{"key":1,"value":"\u0001"},{"key":2,"value":"z"},{"key":3,"value":"zzzzzzzzzzzzzzz{"},{"key":4,"value":"\t\u0000\u0000\u0000"},{"key":5,"value":"\t\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},{"key":6,"value":"fff?"},{"key":7,"value":"ÍÌÌÌÌÌì?"},{"key":8,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":9,"value":"\u0000»\r«Ûõ\u0005\u0000"},{"key":10,"value":"ÙK\u0000\u0000"},{"key":11,"value":"\u0012"},{"key":12,"value":"\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011\u0011"}]},"key_metadata":null,"split_offsets":{"array":[4]},"sort_order_id":null}}V2: The only thing I notice is that the snapshot-id doesn't need to be materialized for the V2 entry with |
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
A faulty refactor:
59742e0#diff-e9df836e263024ba54e2706853fb25c00269fbfe3726b440ba57f4a929c995dcR927
This produces incorrect metadata, but was hidden by the writer.
Rationale for this change
Are these changes tested?
Are there any user-facing changes?