Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/" # Location of package manifests
schedule:
interval: "weekly"
23 changes: 12 additions & 11 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
matrix:
version:
- '1.9'
- '1' # add back when 1.10 is out
- '1'
- 'nightly'
os:
- ubuntu-latest
Expand All @@ -44,18 +44,19 @@ jobs:
name: Documentation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
- uses: actions/checkout@v4.2.1
- uses: julia-actions/setup-julia@v2
with:
version: '1'
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-docdeploy@v1
- run: |
julia --project=docs -e '
using Pkg
Pkg.develop(PackageSpec(path=pwd()))
Pkg.instantiate()'
- run: julia --color=yes --project=docs docs/make.jl
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }}
- run: |
julia --project=docs -e '
using Documenter: DocMeta, doctest
using HuggingFaceDatasets
DocMeta.setdocmeta!(HuggingFaceDatasets, :DocTestSetup, :(using HuggingFaceDatasets); recursive=true)
doctest(HuggingFaceDatasets)'
JULIA_CONDAPKG_OPENSSL_VERSION: "ignore"


7 changes: 6 additions & 1 deletion .github/workflows/CompatHelper.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
run: which julia
continue-on-error: true
- name: Install Julia, but only if it is not already available in the PATH
uses: julia-actions/setup-julia@v1
uses: julia-actions/setup-julia@v2
with:
version: '1'
arch: ${{ runner.arch }}
Expand All @@ -41,5 +41,10 @@ jobs:
shell: julia --color=yes {0}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# This repo uses Documenter, so we can reuse our [Documenter SSH key](https://documenter.juliadocs.org/stable/man/hosting/walkthrough/).
# If we didn't have one of those setup, we could configure a dedicated ssh deploy key `COMPATHELPER_PRIV` following https://juliaregistries.github.io/CompatHelper.jl/dev/#Creating-SSH-Key.
# Either way, we need an SSH key if we want the PRs that CompatHelper creates to be able to trigger CI workflows themselves.
# That is because GITHUB_TOKEN's can't trigger other workflows (see https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#using-the-github_token-in-a-workflow).
# Check if you have a deploy key setup using these docs: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/reviewing-your-deploy-keys.
COMPATHELPER_PRIV: ${{ secrets.DOCUMENTER_KEY }}
# COMPATHELPER_PRIV: ${{ secrets.COMPATHELPER_PRIV }}
2 changes: 1 addition & 1 deletion .github/workflows/TagBot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
workflow_dispatch:
inputs:
lookback:
default: 3
default: "3"
permissions:
actions: read
checks: read
Expand Down
9 changes: 2 additions & 7 deletions CondaPkg.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
channels = ["conda-forge"]

[deps]
# h5py = ""
# pillow = ">=9.1, <10"
# pyarrow = "==6.0.0"
datasets = ">=2.12, <3"
numpy = ">=1.20, <2"
datasets = ">=3.0, <4"
numpy = ">=2.0, <3"
pillow = ""

2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ PythonCall = "6099a3de-0909-46bc-b1f4-468b9a2dfc0d"

[compat]
CondaPkg = "0.2"
DLPack = "0.1"
DLPack = "0.3"
ImageCore = "0.9, 0.10"
MLUtils = "0.4.1"
PythonCall = "0.9"
Expand Down
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,31 +23,33 @@ HuggingFaceDatasets.jl provides wrappers around types from the `datasets` python
Check out the [examples/](https://github.com/JuliaGenAI/HuggingFaceDatasets.jl/tree/main/examples) folder for usage examples.

```julia
julia> using HuggingFaceDatasets

julia> train_data = load_dataset("mnist", split = "train")
Dataset({
features: ['image', 'label'],
num_rows: 60000
})

# Indexing starts with 1.
# Python types are returned by default.
julia> train_data[1]
Python: {'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x7F04DE661CD0>, 'label': 5}
Python: {'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x3340B0290>, 'label': 5}

julia> length(train_data)
60000

# Now we set the julia format
julia> train_data = load_dataset("mnist", split = "train").with_format("julia");

# Returned observations are now julia objects
julia> train_data[1]
julia> train_data[1] # Returned observations are now julia objects
Dict{String, Any} with 2 entries:
"label" => 5
"image" => Gray{N0f8}[Gray{N0f8}(0.0) Gray{N0f8}(0.0)Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0)Gray{N0f8}(0.0) Gray{N0f8}(0.0); … ; Gray{N0f8}(0.0) Gray{N0f8}(0.0) ……
"image" => Gray{N0f8}[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]

julia> train_data[1:2]
Dict{String, Vector} with 2 entries:
"label" => [5, 0]
"image" => ReinterpretArray{Gray{N0f8}, 2, UInt8, Matrix{UInt8}, false}[[Gray{N0f8}(0.0) Gray{N0f8}(0.0)Gray{N0f8}(0.0) Gray{N0f8}(0.0); Gray{N0f8}(0.0) Gray{N0f8}(0.0)Gray{N0f8}(0.0) Gra
"image" => ReinterpretArray{Gray{N0f8}, 2, UInt8, Matrix{UInt8}, false}[[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0
```

## Troubleshooting

- If having problems in resolving the CondaPkg environment, try to set `ENV["JULIA_CONDAPKG_OPENSSL_VERSION"] = true`before loading the package. See more details [here](https://github.com/JuliaPy/CondaPkg.jl?tab=readme-ov-file#preferences)
100 changes: 0 additions & 100 deletions docs/Manifest.toml

This file was deleted.

1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ makedocs(;
),
pages=[
"Home" => "index.md",
"API" => "api.md",
],
)

Expand Down
11 changes: 4 additions & 7 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
# API

## Index

```@index
Pages = ["api.md"]
```@meta
CurrentModule = HuggingFaceDatasets
CollapsedDocStrings = true
```

## Docs
# API

```@autodocs
Modules = [HuggingFaceDatasets]
Expand Down
30 changes: 17 additions & 13 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,26 +26,30 @@ HuggingFaceDatasets.jl provides wrappers around types from the `datasets` python
Check out the `examples/` folder for usage examples.

```julia
# Returned observations are now julia objects
julia> using HuggingFaceDatasets

julia> train_data = load_dataset("mnist", split = "train")
Dataset(<py Dataset({
Dataset({
features: ['image', 'label'],
num_rows: 60000
})>, identity)
})

# Indexing starts with 1.
# By defaul, python types are returned.
julia> train_data[1]
Python dict: {'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x2B64E2E90>, 'label': 5}
Python: {'image': <PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x3340B0290>, 'label': 5}

julia> set_format!(train_data, "julia")
Dataset(<py Dataset({
features: ['image', 'label'],
num_rows: 60000
})>, HuggingFaceDatasets.py2jl)
julia> length(train_data)
60000

# Now we have julia types
julia> train_data[1]
julia> train_data = load_dataset("mnist", split = "train").with_format("julia");

julia> train_data[1] # Returned observations are now julia objects
Dict{String, Any} with 2 entries:
"label" => 5
"image" => UInt8[0x00 0x00 … 0x00 0x00; 0x00 0x00 … 0x00 0x00; … ; 0x00 0x00 … 0x00 0x00; 0x00 0x00 … 0x00 0x00]
"image" => Gray{N0f8}[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]

julia> train_data[1:2]
Dict{String, Vector} with 2 entries:
"label" => [5, 0]
"image" => ReinterpretArray{Gray{N0f8}, 2, UInt8, Matrix{UInt8}, false}[[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0…
```
5 changes: 3 additions & 2 deletions src/HuggingFaceDatasets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module HuggingFaceDatasets
using PythonCall
using MLUtils: getobs, numobs
import MLUtils
using DLPack
using DLPack: DLPack
using ImageCore

const datasets = PythonCall.pynew()
Expand Down Expand Up @@ -37,8 +37,9 @@ include("load_dataset.jl")
export load_dataset

function __init__()
ENV["JULIA_CONDAPKG_OPENSSL_VERSION"] = "ignore"
# Since it is illegal in PythonCall to import a python module in a module, we need to do this here.
# https://cjdoris.github.io/PythonCall.jl/dev/pythoncall-reference/#PythonCall.pycopy!
# https://juliapy.github.io/PythonCall.jl/dev/pythoncall-reference/#PythonCall.Core.pycopy!
PythonCall.pycopy!(datasets, pyimport("datasets"))
PythonCall.pycopy!(PIL, pyimport("PIL"))
pyimport("PIL.PngImagePlugin")
Expand Down
Loading
Loading