-
Notifications
You must be signed in to change notification settings - Fork 412
Support wasb:// and wasbs://
#1663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
There is also an open issue on the Regarding fsspec/adlfs#493, is the protocol identical? |
|
I am not sure but have been testing it with Azurite localy and it works as expected. I am going to try use it on the cloud. |
|
@christophediprima Thanks for testing that, appreciate it. We also test against |
|
We have been testing it on Azure Blob Storage with my team and we had no issues. What kind of tests can you think about? |
|
Looks like we have a few adls integration tests against the azurite docker iceberg-python/tests/io/test_fsspec.py Line 298 in b86d7d5
perhaps we can extend these to include wasb and wasbs |
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
Starting from version 20, PyArrow supports ADLS filesystem. This PR adds
Pyarrow Azure support to Pyiceberg.
PyArrow is the [default
IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369)
for Pyiceberg catalogs. In Azure environment it handles wider spectrum
of auth strategies then Fsspec, including, for instance, [Managed
Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview).
Also, prior to this PR
#1663 (that is not merged
yet) there was no support for wasb(s) with Fsspec.
See the corresponding issue for more details:
#2112
# Are these changes tested?
Tests are added under tests/io/test_pyarrow.py.
# Are there any user-facing changes?
There are no API breaking changes. Direct impact of the PR: Pyarrow
FileIO in Pyiceberg supports Azure cloud environment. Examples of impact
for final users:
- Pyiceberg is usable in services with Managed Identities auth strategy.
- Pyiceberg is usable with wasb(s) schemes in Azure.
<!-- In the case of user-facing changes, please add the changelog label.
-->
---------
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
Starting from version 20, PyArrow supports ADLS filesystem. This PR adds
Pyarrow Azure support to Pyiceberg.
PyArrow is the [default
IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369)
for Pyiceberg catalogs. In Azure environment it handles wider spectrum
of auth strategies then Fsspec, including, for instance, [Managed
Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview).
Also, prior to this PR
apache#1663 (that is not merged
yet) there was no support for wasb(s) with Fsspec.
See the corresponding issue for more details:
apache#2112
# Are these changes tested?
Tests are added under tests/io/test_pyarrow.py.
# Are there any user-facing changes?
There are no API breaking changes. Direct impact of the PR: Pyarrow
FileIO in Pyiceberg supports Azure cloud environment. Examples of impact
for final users:
- Pyiceberg is usable in services with Managed Identities auth strategy.
- Pyiceberg is usable with wasb(s) schemes in Azure.
<!-- In the case of user-facing changes, please add the changelog label.
-->
---------
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
|
depends on fsspec/adlfs#493 |
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
Starting from version 20, PyArrow supports ADLS filesystem. This PR adds
Pyarrow Azure support to Pyiceberg.
PyArrow is the [default
IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369)
for Pyiceberg catalogs. In Azure environment it handles wider spectrum
of auth strategies then Fsspec, including, for instance, [Managed
Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview).
Also, prior to this PR
apache#1663 (that is not merged
yet) there was no support for wasb(s) with Fsspec.
See the corresponding issue for more details:
apache#2112
# Are these changes tested?
Tests are added under tests/io/test_pyarrow.py.
# Are there any user-facing changes?
There are no API breaking changes. Direct impact of the PR: Pyarrow
FileIO in Pyiceberg supports Azure cloud environment. Examples of impact
for final users:
- Pyiceberg is usable in services with Managed Identities auth strategy.
- Pyiceberg is usable with wasb(s) schemes in Azure.
<!-- In the case of user-facing changes, please add the changelog label.
-->
---------
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
|
i have a local change that parameterizes all the adls integration tests with abfs, abfss, wasb, and wasbs its currently failing with, note the wrong path |
|
pushed the parameterized test here for reference. i changed all reference of the protocol for adls to use the |
|
added the monkey patch solution here for reference. we can also wait for fsspec/adlfs#493 to land |
|
fsspec/adlfs#512 added the ability to override protocol but for older versions of adlfs, we would still need to monkey patch |
Closes #2271 and #1606
This will work as soon as this is merged: fsspec/adlfs#493