Skip to content

Conversation

@dawidborycki
Copy link
Contributor

Before submitting a pull request for a new Learning Path, please review Create a Learning Path

  • I have reviewed Create a Learning Path

Please do not include any confidential information in your contribution. This includes confidential microarchitecture details and unannounced product information.

  • I have checked my contribution for confidential information

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the Creative Commons Attribution 4.0 International License.

Copy link

@GemmaParis GemmaParis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first chapter the "theory" chapter, perhaps it is too long, but I like your style of writing and the content conveyed. Let's see what the LP team thinks of this. The rest looks great! I have picked the need to upgrade from "Arm compute library" to "Arm Kleidi Kernels". See comments

1. Cross-platform support. ORT runs on Windows, Linux, macOS, and mobile operating systems like Android and iOS. It has first-class support for both x86 and Arm64 architectures, making it ideal for deployment on devices ranging from cloud servers to Raspberry Pi boards and smartphones.

2. Hardware acceleration. ORT integrates with a wide range of execution providers (EPs) that tap into hardware capabilities:
* Arm NEON / Arm Compute Library for efficient CPU execution on Arm64.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "Arm Kleidi kernels accelerated with Arm Neon, SVE2 and SME2, for efficient CPU execution on Arm64"

A typical ONNX workflow looks like this:
1. Train the model. You first use your preferred framework (e.g., PyTorch, TensorFlow, or scikit-learn) to design and train a model. At this stage, you benefit from the flexibility and ecosystem of the framework of your choice.
2. Export to ONNX. Once trained, the model is exported into the ONNX format using built-in converters (such as torch.onnx.export for PyTorch). This produces a portable .onnx file describing the network architecture, weights, and metadata.
3. Run inference with ONNX Runtime. The ONNX model can now be executed on different devices using ONNX Runtime. On Arm64 hardware, ONNX Runtime takes advantage of the Arm Compute Library and NEON instructions, while on Android devices it can leverage NNAPI for mobile accelerators.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Arm Kleidi kernels accelerated with NEON, SVE2 and SME2 instructions"

dawidborycki and others added 2 commits December 22, 2025 17:06
Added draft status to ONNX topic and updated metadata.
@pareenaverma
Copy link
Contributor

merging into main for tech review

@pareenaverma pareenaverma merged commit 8001c6e into ArmDeveloperEcosystem:main Jan 13, 2026
2 of 3 checks passed
## Choosing the hardware
You can choose a variety of hardware, including:
* Edge boards (Linux/Arm64) - Raspberry Pi 4/5 (64-bit OS), Jetson (Arm64 CPU; GPU via CUDA if using NVIDIA stack), Arm servers (e.g., AWS Graviton).
* Apple Silicon (macOS/Arm64) - Great for development, deploy to Arm64 Linux later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dawidborycki - I'm modifying the hardware selection for the host development machine to be an Arm Linux based machine. I tested this on macOS and (venv) parver01@KWJY1XP2MT ~ % pip3 install onnx onnxruntime onnxscript netron numpy
Collecting onnx
Using cached onnx-1.20.1-cp312-abi3-macosx_12_0_universal2.whl.metadata (8.4 kB)
ERROR: Could not find a version that satisfies the requirement onnxruntime (from versions: none)
ERROR: No matching distribution found for onnxruntime

Our end goal is a camera-to-solution Sudoku app that runs efficiently on Arm64 devices (e.g., Raspberry Pi or Android phones). ONNX is the glue: we’ll train the digit recognizer in PyTorch, export it to ONNX, and run it anywhere with ONNX Runtime (CPU EP on edge devices, NNAPI EP on Android). Everything around the model—grid detection, perspective rectification, and solving—stays deterministic and lightweight.

## Objective
In this step, we will generate a custom dataset of Sudoku puzzles and their digit crops, which we’ll use to train a digit recognition model. Starting from a Hugging Face parquet dataset that provides paired puzzle/solution strings, we transform raw boards into realistic, book-style Sudoku pages, apply camera-like augmentations to mimic mobile captures, and automatically slice each page into 81 labeled cell images. This yields a large, diverse, perfectly labeled set of digits (0–9 with 0 = blank) without manual annotation. By the end, you’ll have a structured dataset ready to train a lightweight model in the next section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dawidborycki can you please point me to the Hugging Face parquet dataset that you used. I'll modify the instructions to point the developer the exact dataset that needs to be downloaded before running prepare data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants