Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
305 changes: 305 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,305 @@
# AGENTS.md

This file provides guidance to coding agents, such as Claude Code (claude.ai/code) when working with code in this repository.

## Overview

ExecuTorch is PyTorch's unified solution for deploying AI models on-device (smartphones to microcontrollers). It uses ahead-of-time (AOT) compilation to prepare PyTorch models for edge deployment via export → compile → execute workflow, producing `.pte` (PyTorch ExecuTorch) files for on-device execution.

**Production Status**: Powers billions of users at Meta across Instagram, WhatsApp, Quest 3, Ray-Ban Meta Smart Glasses.

## Build System

### Initial Setup

See `docs/source/using-executorch-building-from-source.md` for environment setup and installation.

### Common CMake Build Options

Configure with `-D<OPTION>=ON/OFF`:
- `EXECUTORCH_BUILD_XNNPACK` - XNNPACK backend (CPU optimization)
- `EXECUTORCH_BUILD_COREML` - CoreML backend (Apple Neural Engine)
- `EXECUTORCH_BUILD_MPS` - Metal Performance Shaders (Apple GPU)
- `EXECUTORCH_BUILD_QNN` - Qualcomm backend
- `EXECUTORCH_BUILD_VULKAN` - Vulkan backend (cross-platform GPU)
- `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` - Optimized kernels (vs portable)
- `EXECUTORCH_BUILD_KERNELS_QUANTIZED` - Quantized kernel support
- `EXECUTORCH_BUILD_EXTENSION_MODULE` - Module wrapper (simplified C++ API)
- `EXECUTORCH_BUILD_EXTENSION_DATA_LOADER` - Data loader implementations
- `EXECUTORCH_BUILD_PYBIND` - Python bindings
- `EXECUTORCH_BUILD_TESTS` - Build tests
- `EXECUTORCH_OPTIMIZE_SIZE` - Size optimization (-Os vs -O2)

### Building Models

Use Makefile targets for common model builds:

```bash
# Build model runners with specific backends
make llama-cpu # Llama with CPU
make voxtral-cuda # Voxtral with CUDA backend
make whisper-metal # Whisper with Metal backend (macOS)
make gemma3-cuda # Gemma3 with CUDA
make help # Show all available targets

# Clean build artifacts
make clean
```

Model binaries output to `cmake-out/examples/models/<model>/`

## Testing

### Python Tests

```bash
# Run all Python tests from repo root
pytest

# Test-specific paths are configured in pytest.ini
# Tests cover: backends, codegen, devtools, examples, exir, kernels, runtime
```

### C++ Tests

```bash
# Build and run all C++ tests
./test/run_oss_cpp_tests.sh

# Run specific test directory
./test/run_oss_cpp_tests.sh runtime/core/test/

# Build size test (runtime + portable kernels)
sh test/build_size_test.sh
```

### Single Test Execution

For focused testing during development:
```bash
# Python: Run specific test file
pytest path/to/test_file.py

# Python: Run specific test function
pytest path/to/test_file.py::test_function_name

# C++: Build and run single directory (see run_oss_cpp_tests.sh)
```

## Code Architecture

### Core Runtime (`runtime/`)

**Minimal C++ runtime** - Loads and executes `.pte` programs. No operators/backends included.

- `runtime/core/` - Core types: `Tensor`, `EValue`, `Error`, `Result`, `ArrayRef`
- `portable_type/` - Portable C++ types compatible with bare-metal (no stdlib dynamic allocation)
- `exec_aten/` - Dual-mode header: uses either ATen types or ExecuTorch portable types
- `runtime/executor/` - `Program` and `Method` execution interfaces
- `runtime/kernel/` - Kernel registration and management
- `runtime/platform/` - Platform Abstraction Layer (PAL) for architecture-specific code
- `runtime/backend/` - Backend delegate runtime APIs

### Export and Lowering (`exir/`)

**EXIR (EXport Intermediate Representation)** - AOT library for model capture and lowering.

- `exir/capture/` - Program capture via `torch.export`
- `exir/dialects/` - Op sets for different IR dialects (ATen, Edge, Backend)
- `exir/passes/` - Compiler passes for optimization and transformation
- `exir/backend/` - Backend delegate AOT APIs for partitioning and lowering
- `exir/emit/` - Converts ExportedProgram to ExecuTorch execution instructions
- `exir/_serialize/` - Serializes to final `.pte` artifact

**Key Concepts**:
- **ATen dialect**: Full PyTorch operator set
- **Edge dialect**: Reduced operator set (Core ATen) for edge devices
- **Backend dialect**: Device-specific lowered representation

### Kernels (`kernels/`)

Operator implementations registered with runtime:

- `kernels/portable/` - **Reference implementations** of Core ATen ops (portable C++, no SIMD)
- `functions.yaml` - Defines portable kernel signatures and metadata
- `kernels/optimized/` - **Optimized implementations** (SIMD, architecture-specific)
- `kernels/quantized/` - Quantized kernel implementations
- `kernels/prim_ops/` - Primitive ops for control flow and symbolic operations
- `kernels/aten/` - ATen-compatible kernels (when building with PyTorch)

**Selective Build**: Use YAML files to specify only needed operators, reducing binary size.

### Backends (`backends/`)

Hardware-specific delegate implementations. Each backend has:
1. **Partitioner** - Splits graph into device-compatible subgraphs
2. **Quantizer** - Backend-specific quantization (optional)
3. **Runtime** - Executes lowered graph on target hardware

**Available Backends**:
- `backends/xnnpack/` - XNNPACK (optimized CPU kernels)
- `backends/apple/coreml/` - CoreML (Apple Neural Engine)
- `backends/apple/mps/` - MPS (Metal Performance Shaders)
- `backends/qualcomm/` - Qualcomm (Hexagon DSP/NPU)
- `backends/arm/` - ARM (Ethos-U NPU, VGF)
- `backends/vulkan/` - Vulkan (cross-platform GPU)
- `backends/mediatek/`, `backends/samsung/`, `backends/cadence/`, etc.

### Extensions (`extension/`)

Higher-level libraries built on runtime:

- `extension/module/` - **Module wrapper** - Simplified C++ API (`Module::load()`, `Module::forward()`)
- `extension/llm/` - LLM-specific runtime, quantization, and optimization
- `runner/` - Text/multimodal runner API for LLMs
- `tokenizers/` - Tokenizer implementations
- `custom_ops/` - Custom ops for LLM optimization
- `extension/data_loader/` - FileDataLoader, BufferDataLoader implementations
- `extension/tensor/` - TensorPtr and tensor creation utilities
- `extension/pybindings/` - Python runtime API
- `extension/runner_util/` - Helpers for C++ execution tools

### Developer Tools (`devtools/`)

- `devtools/etdump/` - **ETDump**: Runtime profiling/debugging data format
- `devtools/etrecord/` - **ETRecord**: AOT debug artifact
- `devtools/inspector/` - Python API to inspect ETDump and ETRecord
- `devtools/bundled_program/` - Model validation tool with test I/O

### Schema (`schema/`)

FlatBuffer schema defining `.pte` file format (`program.fbs`).

### Code Generation (`codegen/`)

Tools to autogenerate kernel bindings between operators and runtime.

## Key Workflows

### Model Export to .pte

```python
import torch
from executorch.exir import to_edge_transform_and_lower
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner

# 1. Export PyTorch model
model = MyModel().eval()
example_inputs = (torch.randn(1, 3, 224, 224),)
exported_program = torch.export.export(model, example_inputs)

# 2. Lower to backend (switch backends by changing partitioner)
program = to_edge_transform_and_lower(
exported_program,
partitioner=[XnnpackPartitioner()] # or CoreMLPartitioner(), QnnPartitioner()
).to_executorch()

# 3. Save .pte file
with open("model.pte", "wb") as f:
f.write(program.buffer)
```

### C++ Runtime Execution

```cpp
#include <executorch/extension/module/module.h>
#include <executorch/extension/tensor/tensor.h>

// Using Module API (simplified)
Module module("model.pte");
auto input = make_tensor_ptr({1, 3, 224, 224}, input_data);
auto outputs = module.forward(input);

// Or using lower-level Program/Method API
// See runtime/executor for Program and Method classes
```

### LLM Export and Execution

```bash
# Export LLM
python -m executorch.extension.llm.export.export_llm \
--model llama3_2 \
--output llama.pte

# C++ execution
#include <executorch/extension/llm/runner/text_llm_runner.h>
auto runner = create_llama_runner("llama.pte", "tiktoken.bin");
runner->generate("Hello", config);
```

## Code Style

### C++ Guidelines

- **C++17 standard**
- **Function names**: `lower_snake_case()` (PyTorch convention, not Google style)
- **File names**: `lower_snake_case.cpp` (not `.cc` or `PascalCase.cpp`)
- **Headers**: Use `#pragma once` (not include guards)
- **Includes**: Use `<angle brackets>` always (not `"quotes"`)
- **Documentation**: Doxygen syntax (`/** ... */` or `///`) with `@param`, `@retval`
- **TODOs**: Reference issue numbers `TODO(#123): description` (not usernames)

### Portability Restrictions

ExecuTorch targets bare-metal systems. **Do not use**:
- **Exceptions** - Not supported on many microcontrollers
- **RTTI/dynamic_cast** - Adds binary bloat
- **Dynamic allocation** - `std::vector`, `std::string`, `new`, `malloc()`
- **Threading** - `std::thread`, `thread_local`, mutexes (except in optional libraries)
- **Iostreams** - File I/O via iostream

**Allowed stdlib**:
- Static concepts: `std::move`, `std::forward`
- Metaprogramming: `std::is_floating_point<>`, `std::enable_if<>`
- Pure functions: `<cmath>`, `<cstring>` (avoid allocation functions like `strdup()`)
- Placement `new` (with manual destruction)

### Python Style

Follow PyTorch core project conventions. Use `lintrunner`:

```bash
lintrunner init # Setup
lintrunner # Check
lintrunner -a # Auto-fix
```

## File Layout Constraints

**IMPORTANT**: The repo directory must be named exactly `executorch` (not `executorch-1`, `et`, etc.) due to include path requirements (`#include <executorch/...>`). See CMakeLists.txt:290-298 for enforcement.

## Backend Development

When adding a backend delegate:
1. Implement partitioner (split graph into subgraphs)
2. Implement runtime delegate (execute lowered graph)
3. Optional: Custom quantizer for backend-specific quantization
4. Add CMake integration in `backends/<name>/CMakeLists.txt`
5. Update backend list in root `CMakeLists.txt`

See `docs/source/backend-delegates-integration.md` for detailed guide.

## Memory and Binary Size

- **Selective Build**: Specify operators via YAML to reduce binary size
- **Memory Planning**: AOT memory allocation strategies
- **Size Optimization**: Use `-DEXECUTORCH_OPTIMIZE_SIZE=ON` for `-Os`
- **Strip Logging**: `-DEXECUTORCH_ENABLE_LOGGING=OFF` removes log strings
- **Disable Verification**: `-DEXECUTORCH_ENABLE_PROGRAM_VERIFICATION=OFF` saves ~20KB

## CI and Testing Requirements

- CI runs automatically on PRs: https://hud.pytorch.org/hud/pytorch/executorch/main <!-- @lint-ignore -->
- All tests must pass before merge
- Add tests for new features/bug fixes
- C++ tests use GTest framework
- Python tests use pytest

## Additional Resources

- Full documentation: https://pytorch.org/executorch
- API reference: https://pytorch.org/executorch/main/api-section.html
- Backend integration guide: docs/source/backend-delegates-integration.md
- Portable C++ programming: docs/source/portable-cpp-programming.md
- Contributing guide: CONTRIBUTING.md
Loading