-
Notifications
You must be signed in to change notification settings - Fork 414
Added FsspecFileIO method for OSS, virtual hosted style default to true, standardized key configurations for OSS
#1788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pyiceberg/io/fsspec.py
Outdated
| "aws_secret_access_key": properties.get(OSS_ACCESS_KEY_SECRET), | ||
| "aws_session_token": properties.get(OSS_SESSION_TOKEN), | ||
| } | ||
| config_kwargs = {"s3": {"addressing_style": "virtual"}, "signature_version": "v4"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should wel also make virtual addressing configurable? Similar to S3_FORCE_VIRTUAL_ADDRESSING?
| config_kwargs = {"s3": {"addressing_style": "virtual"}, "signature_version": "v4"} | |
| config_kwargs = {"s3": {"addressing_style": "virtual"}, "signature_version": "v4"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the init function for OSS is already separated from S3, and users aren't able to interact with OSS without virtual addressing set to true anyway, I think it makes more sense to just set it to true by default to make it simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we've already crossed that bridge with PyArrowFileIO: https://github.com/apache/iceberg-python/pull/1788/files#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdL423-L424
I think it would also be good to add a configuration option for it. we don't know if people are using it, and just removing it feels wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I'm not entirely sure which config_kwargs keys supported by s3fs that can work with OSS connection. The documentation for boto3 compatibility only mentioned setting several keys: https://www.alibabacloud.com/help/en/oss/developer-reference/use-amazon-s3-sdks-to-access-oss#section-jmf-a67-hat
Do you think it'll make sense to copy the config_kwargs setting for S3 but with oss.config-example parameter?

This pull request introduced
FsspecFileIofor OSS configuration method as a backup whenPyArrowFileIOfail. UsingS3FileSystemclass, the method should work as long as the virtual hosted style is invoked. For both method, virtual hosted style is set to true as default.Also, since OSS configuration has its own section, I think it makes more sense to standardize the configuration keys with
oss.certain-configinstead of still relying withs3key.