-
Notifications
You must be signed in to change notification settings - Fork 134
Support types other than String and Int for partition columns #1154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
timsaucer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is a very nice addition. I have one concern about the breaking change without giving users time to adjust. Otherwise I would like to get this into DF48 release.
python/datafusion/context.py
Outdated
| name: str, | ||
| path: str | pathlib.Path, | ||
| table_partition_cols: list[tuple[str, str]] | None = None, | ||
| table_partition_cols: list[tuple[str, pa.DataType]] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid a breaking change, we could make the type hint
table_partition_cols: list[tuple[str, pa.DataType]] | list[tuple[str, str]] | None = None
And then in the python code just below we can do type checking of the table_partition_cols and coerce to pyarrow data type. Alternatively we could overload the method signature and mark the old one as deprecated for a couple of releases. Either way we could avoid a breaking change without giving users the opportunity to upgrade at their schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, I opted for overloading and raising the deprecation warning
timsaucer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for an excellent contribution!
Which issue does this PR close?
Closes #1155
Rationale for this change
Following datafusion PR4221, this PR exposes the functionality in the python bindings
What changes are included in this PR?
Change of the
table_partition_colsarguments inPySessionContext.register_listing_table,PySessionContext.register_parquet,PySessionContext.register_avro,PySessionContext.register_json,PySessionContext.read_parquet,PySessionContext.read_avro,PySessionContext.read_jsonAre there any user-facing changes?
Yes, the signature of the functions above is changed