Skip to content

Commit 65bcaec

Browse files
authored
Merge pull request #436 from rustcoreutils/cc
Cc
2 parents 4629056 + a1ae17b commit 65bcaec

File tree

17 files changed

+3257
-1091
lines changed

17 files changed

+3257
-1091
lines changed

cc/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ Not yet implemented:
9999
- multi-register returns (for structs larger than 8 bytes)
100100
- -fverbose-asm
101101
- Complex initializers
102+
- constant expression evaluation
102103
- VLAs (variable-length arrays)
103104
- top builtins to implement:
104105
__builtin_expect

cc/TODO.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -498,3 +498,182 @@ _Noreturn void exit(int status);
498498
2. Type system: mark function type as noreturn
499499
3. Semantic: warn if function can return
500500
4. Codegen: can omit function epilogue, enable optimizations
501+
502+
---
503+
504+
## Optimization Passes
505+
506+
### Overview
507+
508+
The compiler uses a sparse-style SSA IR, which is well-suited for classical optimizations. Passes are run iteratively until a fixed point is reached.
509+
510+
### Pass 1: SCCP - Sparse Conditional Constant Propagation
511+
512+
**Status:** Not implemented
513+
514+
**What it does:**
515+
- Propagate constants through the CFG, only along reachable paths
516+
- If a branch's condition is constant, mark only that successor reachable
517+
- Lattice: `{UNDEF, CONST(c), UNKNOWN}`
518+
- Forward dataflow across the CFG updating φs
519+
520+
**Simpler alternative:** Global constant propagation without conditional pruning.
521+
522+
### Pass 2: CFG Simplification
523+
524+
**Status:** Not implemented
525+
526+
**What it does:**
527+
- Convert constant-condition branches into unconditional jumps
528+
- Merge simple blocks: A → B where A's only successor is B and B's only predecessor is A
529+
- Remove jumps to jumps (fallthrough simplification)
530+
531+
**Implementation:**
532+
- After DCE, walk the CFG
533+
- If branch/switch has constant condition → replace with direct jump
534+
- If block just jumps, no φ, no side effects → inline successor or merge
535+
536+
### Pass 3: Copy Propagation & SSA Cleanup
537+
538+
**Status:** Not implemented
539+
540+
**What it does:**
541+
- `t1 = x; y = t1;` → `y = x`
542+
- φ simplifications: `t = φ(x, x, x)` → `t = x`
543+
544+
**Implementation:**
545+
- Use SSA def-use chains
546+
- For each `t = x` where x is SSA value and t isn't address-taken, replace uses of t with x
547+
- For φ-nodes where all incoming operands are same → replace uses and remove φ
548+
- Run DCE afterwards
549+
550+
### Pass 4: Local CSE / Value Numbering
551+
552+
**Status:** Not implemented
553+
554+
**What it does:**
555+
- Inside a block, recognize when re-computing same pure expression
556+
- `t1 = a + b; t2 = a + b;` → `t2 = t1`
557+
558+
**Implementation:**
559+
- For each block: maintain map `(opcode, operand1, operand2, type) → SSA value`
560+
- When expression already in map, reuse prior value
561+
- Limit to pure operations: arithmetic, logical ops, comparisons
562+
563+
### Pass 5: GVN - Global Value Numbering
564+
565+
**Status:** Not implemented (optional, bigger investment)
566+
567+
**What it does:**
568+
- Deduplicate computations across blocks, not just inside one
569+
- More global version of CSE respecting control flow and dominance
570+
571+
**Implementation:**
572+
- Walk in dominator order
573+
- Assign "value numbers" to expressions
574+
- Equivalent expressions sharing same number are merged
575+
576+
### Pass 6: Conservative Function Inlining
577+
578+
**Status:** Not implemented
579+
580+
**What it does:**
581+
- Inline small, non-recursive, non-varargs functions into callers
582+
- Exposes new optimization opportunities
583+
584+
**Constraints:**
585+
- Only inline static/internal functions
586+
- Limit by IR size: e.g. ≤ N instructions, ≤ M basic blocks
587+
- Reject varargs, VLAs, alloca, weird control flow
588+
589+
**After inlining:** Re-run InstCombine → SCCP → DCE
590+
591+
### Pass 7: LICM - Loop-Invariant Code Motion
592+
593+
**Status:** Not implemented
594+
595+
**What it does:**
596+
- Hoist computations that are pure and loop-invariant
597+
598+
**Implementation:**
599+
- Detect natural loops via back edges and dominators
600+
- Only hoist arithmetic/logical ops whose operands are defined outside loop
601+
- Don't hoist loads/stores without alias analysis
602+
603+
### Pass 8: Loop Canonicalization & Strength Reduction
604+
605+
**Status:** Not implemented (low priority)
606+
607+
**What it does:**
608+
- Normalize induction variables: `for (i = 0; i < n; ++i)` style
609+
- Replace multiplications with additions
610+
611+
**Implementation:**
612+
- Identify induction variables: φ node in loop header with one incoming from preheader, one from latch
613+
- Handle simple patterns: `i = φ(i0, i + c)`
614+
- Turn `base + i * c` patterns into increments
615+
616+
### Suggested Pass Pipeline
617+
618+
```
619+
IR generation + SSA construction
620+
621+
Early InstCombine (constant folding + algebraic)
622+
623+
SCCP (or simple global const prop)
624+
625+
DCE + unreachable block removal
626+
627+
CFG simplification
628+
629+
Copy propagation + φ simplification → DCE
630+
631+
Local CSE → InstCombine
632+
633+
[Later] GVN → DCE/CFG simplify
634+
635+
[Later] Conservative inlining → re-run passes 2-7
636+
637+
[Later] LICM + loop opts → final InstCombine + DCE
638+
```
639+
640+
### Implementation Priority
641+
642+
| Priority | Pass | Complexity | Impact |
643+
|----------|------|------------|--------|
644+
| 1 | CFG simplify | Low | Medium |
645+
| 2 | Copy/φ cleanup | Low | Medium |
646+
| 3 | Local CSE | Medium | Medium |
647+
| 4 | SCCP | Medium | High |
648+
| 5 | GVN | High | Medium |
649+
| 6 | Inlining | High | High |
650+
| 7 | LICM | Medium | Medium |
651+
| 8 | Loop opts | High | Low |
652+
653+
---
654+
655+
## Assembly Peephole Optimizations
656+
657+
### Overview
658+
659+
Post-codegen peephole optimizations on the generated assembly. These are low-complexity, high-impact micro-optimizations.
660+
661+
### Potential Optimizations
662+
663+
| Pattern | Optimization |
664+
|---------|--------------|
665+
| `mov %rax, %rax` | Delete (no-op move) |
666+
| `mov %rax, %rbx; mov %rbx, %rax` | Delete second (useless copy-back) |
667+
| `add $0, %rax` | Delete (no-op add) |
668+
| `sub $0, %rax` | Delete (no-op sub) |
669+
| `imul $1, %rax, %rax` | Delete (multiply by 1) |
670+
| `xor %rax, %rax; mov $0, %rax` | Keep only xor (shorter encoding) |
671+
| `cmp $0, %rax; je L` | `test %rax, %rax; je L` (shorter) |
672+
| `mov $imm, %rax; add %rax, %rbx` | `add $imm, %rbx` if imm fits |
673+
674+
### Implementation Approach
675+
676+
1. Parse LIR instructions before emission
677+
2. Pattern match on instruction sequences (2-3 instruction windows)
678+
3. Replace with optimized sequences
679+
4. Run multiple passes until no changes

0 commit comments

Comments
 (0)