Looking at the instruction timing view for code using SALU floating point instructions on RDNA4, I get the impression that these instructions take 3 cycles to issue and have a latency of 1 cycle. But based on the ISA docs I would think that they take 1 cycle to issue, and have a latency of 3 cycles. Otherwise the SALU_CYCLE_3 and SALU_CYCLE_2 delays for s_delay_alu would be pointless.
Here I would expect all instructions to take 1 cycle, with an additional 2 cycle delay before 146 and 148.