Skip to content
Merged

Cc #436

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ Not yet implemented:
- multi-register returns (for structs larger than 8 bytes)
- -fverbose-asm
- Complex initializers
- constant expression evaluation
- VLAs (variable-length arrays)
- top builtins to implement:
__builtin_expect
Expand Down
179 changes: 179 additions & 0 deletions cc/TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -498,3 +498,182 @@ _Noreturn void exit(int status);
2. Type system: mark function type as noreturn
3. Semantic: warn if function can return
4. Codegen: can omit function epilogue, enable optimizations

---

## Optimization Passes

### Overview

The compiler uses a sparse-style SSA IR, which is well-suited for classical optimizations. Passes are run iteratively until a fixed point is reached.

### Pass 1: SCCP - Sparse Conditional Constant Propagation

**Status:** Not implemented

**What it does:**
- Propagate constants through the CFG, only along reachable paths
- If a branch's condition is constant, mark only that successor reachable
- Lattice: `{UNDEF, CONST(c), UNKNOWN}`
- Forward dataflow across the CFG updating φs

**Simpler alternative:** Global constant propagation without conditional pruning.

### Pass 2: CFG Simplification

**Status:** Not implemented

**What it does:**
- Convert constant-condition branches into unconditional jumps
- Merge simple blocks: A → B where A's only successor is B and B's only predecessor is A
- Remove jumps to jumps (fallthrough simplification)

**Implementation:**
- After DCE, walk the CFG
- If branch/switch has constant condition → replace with direct jump
- If block just jumps, no φ, no side effects → inline successor or merge

### Pass 3: Copy Propagation & SSA Cleanup

**Status:** Not implemented

**What it does:**
- `t1 = x; y = t1;` → `y = x`
- φ simplifications: `t = φ(x, x, x)` → `t = x`

**Implementation:**
- Use SSA def-use chains
- For each `t = x` where x is SSA value and t isn't address-taken, replace uses of t with x
- For φ-nodes where all incoming operands are same → replace uses and remove φ
- Run DCE afterwards

### Pass 4: Local CSE / Value Numbering

**Status:** Not implemented

**What it does:**
- Inside a block, recognize when re-computing same pure expression
- `t1 = a + b; t2 = a + b;` → `t2 = t1`

**Implementation:**
- For each block: maintain map `(opcode, operand1, operand2, type) → SSA value`
- When expression already in map, reuse prior value
- Limit to pure operations: arithmetic, logical ops, comparisons

### Pass 5: GVN - Global Value Numbering

**Status:** Not implemented (optional, bigger investment)

**What it does:**
- Deduplicate computations across blocks, not just inside one
- More global version of CSE respecting control flow and dominance

**Implementation:**
- Walk in dominator order
- Assign "value numbers" to expressions
- Equivalent expressions sharing same number are merged

### Pass 6: Conservative Function Inlining

**Status:** Not implemented

**What it does:**
- Inline small, non-recursive, non-varargs functions into callers
- Exposes new optimization opportunities

**Constraints:**
- Only inline static/internal functions
- Limit by IR size: e.g. ≤ N instructions, ≤ M basic blocks
- Reject varargs, VLAs, alloca, weird control flow

**After inlining:** Re-run InstCombine → SCCP → DCE

### Pass 7: LICM - Loop-Invariant Code Motion

**Status:** Not implemented

**What it does:**
- Hoist computations that are pure and loop-invariant

**Implementation:**
- Detect natural loops via back edges and dominators
- Only hoist arithmetic/logical ops whose operands are defined outside loop
- Don't hoist loads/stores without alias analysis

### Pass 8: Loop Canonicalization & Strength Reduction

**Status:** Not implemented (low priority)

**What it does:**
- Normalize induction variables: `for (i = 0; i < n; ++i)` style
- Replace multiplications with additions

**Implementation:**
- Identify induction variables: φ node in loop header with one incoming from preheader, one from latch
- Handle simple patterns: `i = φ(i0, i + c)`
- Turn `base + i * c` patterns into increments

### Suggested Pass Pipeline

```
IR generation + SSA construction
Early InstCombine (constant folding + algebraic)
SCCP (or simple global const prop)
DCE + unreachable block removal
CFG simplification
Copy propagation + φ simplification → DCE
Local CSE → InstCombine
[Later] GVN → DCE/CFG simplify
[Later] Conservative inlining → re-run passes 2-7
[Later] LICM + loop opts → final InstCombine + DCE
```

### Implementation Priority

| Priority | Pass | Complexity | Impact |
|----------|------|------------|--------|
| 1 | CFG simplify | Low | Medium |
| 2 | Copy/φ cleanup | Low | Medium |
| 3 | Local CSE | Medium | Medium |
| 4 | SCCP | Medium | High |
| 5 | GVN | High | Medium |
| 6 | Inlining | High | High |
| 7 | LICM | Medium | Medium |
| 8 | Loop opts | High | Low |

---

## Assembly Peephole Optimizations

### Overview

Post-codegen peephole optimizations on the generated assembly. These are low-complexity, high-impact micro-optimizations.

### Potential Optimizations

| Pattern | Optimization |
|---------|--------------|
| `mov %rax, %rax` | Delete (no-op move) |
| `mov %rax, %rbx; mov %rbx, %rax` | Delete second (useless copy-back) |
| `add $0, %rax` | Delete (no-op add) |
| `sub $0, %rax` | Delete (no-op sub) |
| `imul $1, %rax, %rax` | Delete (multiply by 1) |
| `xor %rax, %rax; mov $0, %rax` | Keep only xor (shorter encoding) |
| `cmp $0, %rax; je L` | `test %rax, %rax; je L` (shorter) |
| `mov $imm, %rax; add %rax, %rbx` | `add $imm, %rbx` if imm fits |

### Implementation Approach

1. Parse LIR instructions before emission
2. Pattern match on instruction sequences (2-3 instruction windows)
3. Replace with optimized sequences
4. Run multiple passes until no changes
Loading