Skip to content

Add WASM SIMD128 path for f_pixel::diff()#129

Merged
kornelski merged 1 commit intoImageOptim:mainfrom
imazen:wasm-simd128-upstream
Feb 11, 2026
Merged

Add WASM SIMD128 path for f_pixel::diff()#129
kornelski merged 1 commit intoImageOptim:mainfrom
imazen:wasm-simd128-upstream

Conversation

@lilith
Copy link
Contributor

@lilith lilith commented Feb 8, 2026

Summary

  • Add a wasm32 + simd128 implementation of f_pixel::diff() using safe core::arch::wasm32 intrinsics
  • Add wasm32 + simd128 to the repr(C, align(16)) cfg on f_pixel
  • Exclude wasm32 + simd128 from the scalar fallback cfg(not(...))

The implementation uses the safe f32x4() constructor (no unsafe block, no raw pointers) and follows the same arithmetic pattern as the existing SSE and NEON paths. No new dependencies — everything is from core::arch::wasm32 (stable since Rust 1.54).

Benchmark

256x256 full quantization pipeline in wasmtime (5 iterations, release mode):

Variant Avg/iter Speedup
Scalar 260ms
SIMD128 135ms 1.9x

Changes

One file: src/pal.rs — 27 lines added, 2 lines modified (cfg guards).

Add a wasm32+simd128 implementation of the hot diff() function using
safe core::arch::wasm32 intrinsics (f32x4 constructor, no unsafe).

Translates the existing SSE/NEON pattern:
- f32x4() to pack ARGB into a v128 (safe, no pointer load)
- f32x4_sub/add/mul/max for packed arithmetic
- f32x4_extract_lane + scalar add for horizontal RGB sum

Also adds wasm32+simd128 to the repr(C, align(16)) cfg and excludes
it from the scalar fallback guard.

Measured ~1.9x end-to-end speedup on a 256x256 quantization benchmark
running in wasmtime (scalar: 260ms/iter → simd128: 135ms/iter).
@kornelski kornelski merged commit b1df2d2 into ImageOptim:main Feb 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants