Add WASM SIMD128 path for f_pixel::diff()#129
Merged
kornelski merged 1 commit intoImageOptim:mainfrom Feb 11, 2026
Merged
Conversation
Add a wasm32+simd128 implementation of the hot diff() function using safe core::arch::wasm32 intrinsics (f32x4 constructor, no unsafe). Translates the existing SSE/NEON pattern: - f32x4() to pack ARGB into a v128 (safe, no pointer load) - f32x4_sub/add/mul/max for packed arithmetic - f32x4_extract_lane + scalar add for horizontal RGB sum Also adds wasm32+simd128 to the repr(C, align(16)) cfg and excludes it from the scalar fallback guard. Measured ~1.9x end-to-end speedup on a 256x256 quantization benchmark running in wasmtime (scalar: 260ms/iter → simd128: 135ms/iter).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
wasm32 + simd128implementation off_pixel::diff()using safecore::arch::wasm32intrinsicswasm32 + simd128to therepr(C, align(16))cfg onf_pixelwasm32 + simd128from the scalar fallbackcfg(not(...))The implementation uses the safe
f32x4()constructor (nounsafeblock, no raw pointers) and follows the same arithmetic pattern as the existing SSE and NEON paths. No new dependencies — everything is fromcore::arch::wasm32(stable since Rust 1.54).Benchmark
256x256 full quantization pipeline in wasmtime (5 iterations, release mode):
Changes
One file:
src/pal.rs— 27 lines added, 2 lines modified (cfg guards).