S390x: emit new instructions added in z17#12319
S390x: emit new instructions added in z17#12319cfallin merged 3 commits intobytecodealliance:mainfrom
Conversation
ae1c56a to
fac9929
Compare
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:x64", "cranelift:meta", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
I'm going to shift review of this over to @uweigand |
|
or, well, I can't officially do that, but @uweigand I'm happy to rubber-stamp once you've approved |
uweigand
left a comment
There was a problem hiding this comment.
Looks mostly good to me, but see inline comments.
In addition to what is implemented here, we now could implement vector integer division for 32-bit and 64-bit integer vectors - but there is currently no ISLE to even express this.
| (rule 16 (lower (has_type (and (vxrs_ext3_enabled) (vr128_ty ty)) (band (band y z) (bnot x)))) | ||
| (vec_eval ty 0b00000010 x y z)) | ||
| (rule 17 (lower (has_type (and (vxrs_ext3_enabled) (vr128_ty ty)) (band (band x (bnot y)) z))) | ||
| (vec_eval ty 0b00000010 y x z)) |
There was a problem hiding this comment.
Not sure what if any canonicalization is done at the ISLE level here, but these four don't cover all possible combinations. E.g. (band x (band (bnot y) z)) is not covered.
I guess a more fundamental question is which combinations we should be covering. For example, why cover and-not with three inputs but not or-not?
There was a problem hiding this comment.
I stopped because I realized this would add hundreds of rules to get correct, and ran out of steam pretty quickly. I think it's an open question: what should we encode? what 3-input binary operations are actually used?
There was a problem hiding this comment.
I made a python script to handle this, and committed the results. My thinking is this:
- This only has to happen once
- It would be better to have the results interspersed in the file near their 2-input counterparts for better
| ; block0: ; offset 0x0 | ||
| ; .byte 0xe7, 0x8a | ||
| ; .byte 0x80, 0x02 | ||
| ; .byte 0x9f, 0x88 |
There was a problem hiding this comment.
We should also add z17 insns to the disassembler, but that is of course a different patch.
fac9929 to
73c5595
Compare
a32e45e to
898cc0e
Compare
uweigand
left a comment
There was a problem hiding this comment.
This looks all good to me now, except for the srem implementation - see inline comment.
This emits & tests a bunch of instructions:
* from Miscellaneous-Instruction-Extensions Facility 4:
* CLZ, 64bit
* CTZ, 64bit
* from Vector-Enhancements Facility 3:
* 32x4, 64x2 & 128x1 variants of the following:
* Divide
* Remainder
* 64x2 & 128x1 multiply variants
* 128x1 vaiants of:
* Compare
* CLZ
* CTZ
* Max
* Min
* Average
* Negation
* Evaluate
Co-authored-by: Jimmy Brisson <jbrisson@linux.ibm.com>
Now that s390x implements blendv as well, we should refer to the instruction without the x86 prefix.
898cc0e to
8ae8e8f
Compare
uweigand
left a comment
There was a problem hiding this comment.
Thanks, this version LGTM now!
cfallin
left a comment
There was a problem hiding this comment.
Rubber-stamping based on Ulrich's review -- thanks!
Z17 (arch15) includes some instructions that allow us to encode some more complicated operations in fewer instructions. This PR adds support to cranelift-codegen to emit these newer instructions when appropriate.
Further, Z17 includes a VBLEND instruction that mimics the same instruction on x64. Since this is no longer an x64-exclusive instruction type, I've renamed the appropriate stuff within cranelift codegen to reflect that this is not ISA-specific anymore.