Skip to content

Conversation

@helmiazizm
Copy link
Contributor

This pull request introduced FsspecFileIo for OSS configuration method as a backup when PyArrowFileIO fail. Using S3FileSystem class, the method should work as long as the virtual hosted style is invoked. For both method, virtual hosted style is set to true as default.

Also, since OSS configuration has its own section, I think it makes more sense to standardize the configuration keys with oss.certain-config instead of still relying with s3 key.

@helmiazizm
Copy link
Contributor Author

Local test result for s3fs.S3FileSystem

image

"aws_secret_access_key": properties.get(OSS_ACCESS_KEY_SECRET),
"aws_session_token": properties.get(OSS_SESSION_TOKEN),
}
config_kwargs = {"s3": {"addressing_style": "virtual"}, "signature_version": "v4"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should wel also make virtual addressing configurable? Similar to S3_FORCE_VIRTUAL_ADDRESSING?

Suggested change
config_kwargs = {"s3": {"addressing_style": "virtual"}, "signature_version": "v4"}
config_kwargs = {"s3": {"addressing_style": "virtual"}, "signature_version": "v4"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the init function for OSS is already separated from S3, and users aren't able to interact with OSS without virtual addressing set to true anyway, I think it makes more sense to just set it to true by default to make it simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we've already crossed that bridge with PyArrowFileIO: https://github.com/apache/iceberg-python/pull/1788/files#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdL423-L424

I think it would also be good to add a configuration option for it. we don't know if people are using it, and just removing it feels wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not entirely sure which config_kwargs keys supported by s3fs that can work with OSS connection. The documentation for boto3 compatibility only mentioned setting several keys: https://www.alibabacloud.com/help/en/oss/developer-reference/use-amazon-s3-sdks-to-access-oss#section-jmf-a67-hat

Do you think it'll make sense to copy the config_kwargs setting for S3 but with oss.config-example parameter?

@helmiazizm helmiazizm requested a review from Fokko March 17, 2025 02:20
@helmiazizm helmiazizm closed this Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants