@@ -498,3 +498,182 @@ _Noreturn void exit(int status);
4984982. Type system: mark function type as noreturn
4994993. Semantic: warn if function can return
5005004. Codegen: can omit function epilogue, enable optimizations
501+
502+ ---
503+
504+ ## Optimization Passes
505+
506+ ### Overview
507+
508+ The compiler uses a sparse-style SSA IR, which is well-suited for classical optimizations. Passes are run iteratively until a fixed point is reached.
509+
510+ ### Pass 1: SCCP - Sparse Conditional Constant Propagation
511+
512+ **Status:** Not implemented
513+
514+ **What it does:**
515+ - Propagate constants through the CFG, only along reachable paths
516+ - If a branch's condition is constant, mark only that successor reachable
517+ - Lattice: `{UNDEF, CONST(c), UNKNOWN}`
518+ - Forward dataflow across the CFG updating φs
519+
520+ **Simpler alternative:** Global constant propagation without conditional pruning.
521+
522+ ### Pass 2: CFG Simplification
523+
524+ **Status:** Not implemented
525+
526+ **What it does:**
527+ - Convert constant-condition branches into unconditional jumps
528+ - Merge simple blocks: A → B where A's only successor is B and B's only predecessor is A
529+ - Remove jumps to jumps (fallthrough simplification)
530+
531+ **Implementation:**
532+ - After DCE, walk the CFG
533+ - If branch/switch has constant condition → replace with direct jump
534+ - If block just jumps, no φ, no side effects → inline successor or merge
535+
536+ ### Pass 3: Copy Propagation & SSA Cleanup
537+
538+ **Status:** Not implemented
539+
540+ **What it does:**
541+ - `t1 = x; y = t1;` → `y = x`
542+ - φ simplifications: `t = φ(x, x, x)` → `t = x`
543+
544+ **Implementation:**
545+ - Use SSA def-use chains
546+ - For each `t = x` where x is SSA value and t isn't address-taken, replace uses of t with x
547+ - For φ-nodes where all incoming operands are same → replace uses and remove φ
548+ - Run DCE afterwards
549+
550+ ### Pass 4: Local CSE / Value Numbering
551+
552+ **Status:** Not implemented
553+
554+ **What it does:**
555+ - Inside a block, recognize when re-computing same pure expression
556+ - `t1 = a + b; t2 = a + b;` → `t2 = t1`
557+
558+ **Implementation:**
559+ - For each block: maintain map `(opcode, operand1, operand2, type) → SSA value`
560+ - When expression already in map, reuse prior value
561+ - Limit to pure operations: arithmetic, logical ops, comparisons
562+
563+ ### Pass 5: GVN - Global Value Numbering
564+
565+ **Status:** Not implemented (optional, bigger investment)
566+
567+ **What it does:**
568+ - Deduplicate computations across blocks, not just inside one
569+ - More global version of CSE respecting control flow and dominance
570+
571+ **Implementation:**
572+ - Walk in dominator order
573+ - Assign "value numbers" to expressions
574+ - Equivalent expressions sharing same number are merged
575+
576+ ### Pass 6: Conservative Function Inlining
577+
578+ **Status:** Not implemented
579+
580+ **What it does:**
581+ - Inline small, non-recursive, non-varargs functions into callers
582+ - Exposes new optimization opportunities
583+
584+ **Constraints:**
585+ - Only inline static/internal functions
586+ - Limit by IR size: e.g. ≤ N instructions, ≤ M basic blocks
587+ - Reject varargs, VLAs, alloca, weird control flow
588+
589+ **After inlining:** Re-run InstCombine → SCCP → DCE
590+
591+ ### Pass 7: LICM - Loop-Invariant Code Motion
592+
593+ **Status:** Not implemented
594+
595+ **What it does:**
596+ - Hoist computations that are pure and loop-invariant
597+
598+ **Implementation:**
599+ - Detect natural loops via back edges and dominators
600+ - Only hoist arithmetic/logical ops whose operands are defined outside loop
601+ - Don't hoist loads/stores without alias analysis
602+
603+ ### Pass 8: Loop Canonicalization & Strength Reduction
604+
605+ **Status:** Not implemented (low priority)
606+
607+ **What it does:**
608+ - Normalize induction variables: `for (i = 0; i < n; ++i)` style
609+ - Replace multiplications with additions
610+
611+ **Implementation:**
612+ - Identify induction variables: φ node in loop header with one incoming from preheader, one from latch
613+ - Handle simple patterns: `i = φ(i0, i + c)`
614+ - Turn `base + i * c` patterns into increments
615+
616+ ### Suggested Pass Pipeline
617+
618+ ```
619+ IR generation + SSA construction
620+ ↓
621+ Early InstCombine (constant folding + algebraic)
622+ ↓
623+ SCCP (or simple global const prop)
624+ ↓
625+ DCE + unreachable block removal
626+ ↓
627+ CFG simplification
628+ ↓
629+ Copy propagation + φ simplification → DCE
630+ ↓
631+ Local CSE → InstCombine
632+ ↓
633+ [ Later] GVN → DCE/CFG simplify
634+ ↓
635+ [ Later] Conservative inlining → re-run passes 2-7
636+ ↓
637+ [ Later] LICM + loop opts → final InstCombine + DCE
638+ ```
639+
640+ ### Implementation Priority
641+
642+ | Priority | Pass | Complexity | Impact |
643+ |----------|------|------------|--------|
644+ | 1 | CFG simplify | Low | Medium |
645+ | 2 | Copy/φ cleanup | Low | Medium |
646+ | 3 | Local CSE | Medium | Medium |
647+ | 4 | SCCP | Medium | High |
648+ | 5 | GVN | High | Medium |
649+ | 6 | Inlining | High | High |
650+ | 7 | LICM | Medium | Medium |
651+ | 8 | Loop opts | High | Low |
652+
653+ ---
654+
655+ ## Assembly Peephole Optimizations
656+
657+ ### Overview
658+
659+ Post-codegen peephole optimizations on the generated assembly. These are low-complexity, high-impact micro-optimizations.
660+
661+ ### Potential Optimizations
662+
663+ | Pattern | Optimization |
664+ |---------|--------------|
665+ | `mov %rax, %rax` | Delete (no-op move) |
666+ | `mov %rax, %rbx; mov %rbx, %rax` | Delete second (useless copy-back) |
667+ | `add $0, %rax` | Delete (no-op add) |
668+ | `sub $0, %rax` | Delete (no-op sub) |
669+ | `imul $1, %rax, %rax` | Delete (multiply by 1) |
670+ | `xor %rax, %rax; mov $0, %rax` | Keep only xor (shorter encoding) |
671+ | `cmp $0, %rax; je L` | `test %rax, %rax; je L` (shorter) |
672+ | `mov $imm, %rax; add %rax, %rbx` | `add $imm, %rbx` if imm fits |
673+
674+ ### Implementation Approach
675+
676+ 1. Parse LIR instructions before emission
677+ 2. Pattern match on instruction sequences (2-3 instruction windows)
678+ 3. Replace with optimized sequences
679+ 4. Run multiple passes until no changes
0 commit comments