-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Currently, it uses a generic build script.
This script assumes:
const REGISTER_SIZE = 16
const REGISTER_COUNT = 16
const CACHELINE_SIZE = 64
const SIMD_NATIVE_INTEGERS = trueIf any of these are violated, dependent libraries (e.g., LoopVectorization) are likely to produce suboptimal code. If these numbers undershoot, that would just mean some performance is left on the table, but it's likely to perform reasonably well.
If these numbers overshoot, performance consequences could be dire. Register spills galore.
I believe some ARM CPUs do not have SIMD Float64, so perhaps this should be handled somehow.
Ideally, we'd use a library like CpuId.jl to query hardware info, like we do for AMD and Intel.
AStupidBear
Metadata
Metadata
Assignees
Labels
No labels