Skip to content

Conversation

@lhoestq
Copy link
Contributor

@lhoestq lhoestq commented May 13, 2025

Rationale for this change

Add support for the Hugging Face filesystem in fsspec, which uses hf:// paths.
This allows to import HF datasets.

Authentication is done using the "hf.token" property.

Are these changes tested?

I tried locally but haven't added tests in test_fsspec.py (lmk if it's a requirement)

Are there any user-facing changes?

No changes, it simply adds support for hf:// URLs

@Fokko
Copy link
Contributor

Fokko commented May 14, 2025

Hey @lhoestq Thanks for raising this PR. I think this is super interesting!

I think the PR needs a couple more things:

@lhoestq
Copy link
Contributor Author

lhoestq commented May 14, 2025

I updated pyproject.toml and added some docs :)

PS: I also added the "hf" extra in pyproject.toml, lmk if this is fine

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this.

- **hdfs**: `PyArrowFileIO`
- **abfs**, **abfss**: `FsspecFileIO`
- **oss**: `PyArrowFileIO`
- **hf**: `FsspecFileIO`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to allow PyArrowFileIO as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no HF filesystem implementation in arrow C++ yet unfortunately ! But hopefully soon

@kevinjqliu
Copy link
Contributor

#24 34.52 E: Failed to fetch http://deb.debian.org/debian-security/pool/updates/main/o/openjdk-11/openjdk-11-jdk-headless_11.0.26%2b4-1%7edeb11u1_amd64.deb Error reading from server - read (104: Connection reset by peer) [IP: 146.75.30.132 80]

just retriggered CI

@Fokko
Copy link
Contributor

Fokko commented May 16, 2025

Also retriggered the CI 😄

@lhoestq
Copy link
Contributor Author

lhoestq commented May 16, 2025

all green ! thanks

@kevinjqliu kevinjqliu changed the title Enable Hugging Face filesystem Add Hugging Face filesystem support to fsspec May 16, 2025
@kevinjqliu kevinjqliu merged commit 55b75ca into apache:main May 16, 2025
11 checks passed
@kevinjqliu
Copy link
Contributor

Thanks @lhoestq for the contribution and @Fokko for the review :)

amitgilad3 pushed a commit to amitgilad3/iceberg-python that referenced this pull request Jul 7, 2025
# Rationale for this change

Add support for the Hugging Face filesystem in `fsspec`, which uses
`hf://` paths.
This allows to import [HF datasets](https://huggingface.co/datasets).

Authentication is done using the `"hf.token"` property.

# Are these changes tested?

I tried locally but haven't added tests in test_fsspec.py (lmk if it's a
requirement)

# Are there any user-facing changes?

No changes, it simply adds support for `hf://` URLs
gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
# Rationale for this change

Add support for the Hugging Face filesystem in `fsspec`, which uses
`hf://` paths.
This allows to import [HF datasets](https://huggingface.co/datasets).

Authentication is done using the `"hf.token"` property.

# Are these changes tested?

I tried locally but haven't added tests in test_fsspec.py (lmk if it's a
requirement)

# Are there any user-facing changes?

No changes, it simply adds support for `hf://` URLs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants