|
1 | 1 | +++ |
2 | | -date = '2025-06-16T20:46:09+01:00' |
3 | | -draft = true |
| 2 | +date = '2025-06-18' |
| 3 | +draft = false |
4 | 4 | title = 'Project' |
5 | 5 | layout = "page" |
6 | 6 | +++ |
7 | 7 |
|
8 | | -This MSc group project is a collaboration between UCL and IBM focused on using AI - particularly Large Language Models (LLMs) — to assist in the modernisation of legacy software systems. |
| 8 | +# LLM-Based Legacy Refactoring |
| 9 | +*A modular, agentic system for detecting and fixing anti-patterns in fully-tested Java code* |
9 | 10 |
|
10 | | -Details TBD |
| 11 | +This MSc group project is a collaboration between UCL and IBM. |
| 12 | + |
| 13 | +## Overview |
| 14 | + |
| 15 | +Modernising legacy code is one of the most persistent and expensive challenges in software engineering. Rewriting from scratch is often infeasible, manual refactoring at scale is slow, error-prone, and hard to standardise. We want to build an intelligent, safe alternative. |
| 16 | + |
| 17 | +This project aims to develop an **LLM-powered refactoring pipeline** that detects and replaces anti-patterns with **modern, idiomatic Java**, operating only on code that is **fully covered by tests**. It will use a **multi-agent architecture** to ensure each stage - detection, transformation, explanation, and validation - is precise, modular, and auditable. |
| 18 | + |
| 19 | +We are experimenting with multiple LLMs (IBM Granite, Ollama, etc.) to evaluate which models are best suited for understanding, rewriting, and explaining legacy code. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## What We're Building |
| 24 | + |
| 25 | +We are developing a **local-first, file-scoped toolchain** for the safe refactoring of Java code. Our system: |
| 26 | + |
| 27 | +- Focuses on **anti-patterns** and **idiomatic code**, not just style issues. |
| 28 | +- Performs **only file-level changes**, preserving interfaces and class contracts. |
| 29 | +- Targets files with **100% test coverage**, so correctness can be automatically validated. |
| 30 | +- Produces **explainable transformations**, aiding developer understanding and review. |
| 31 | + |
| 32 | +The tool is designed to support **both automated and interactive workflows**: |
| 33 | + |
| 34 | +- You can **plug in a repository** and let the system run on all eligible files (i.e., those with full test coverage). |
| 35 | +- Alternatively, you can **manually prompt the tool on specific classes or files** to refactor targeted areas or experiment with different strategies. |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## System Architecture |
| 40 | + |
| 41 | +Our tool is built as a set of cooperating agents, following the **Model Context Protocol (MCP)**. Each agent is responsible for a distinct part of the pipeline. |
| 42 | + |
| 43 | +- **Coverage Agent**: Ensures the file is fully tested before any changes are made. |
| 44 | +- **Scanner Agent**: Detects known anti-patterns using heuristics and LLM support. |
| 45 | +- **Refactoring Agent**: Proposes improved code based on modern idioms. |
| 46 | +- **Code Transformer**: Applies changes safely, without breaking public APIs. |
| 47 | +- **Test Executor**: Runs tests to confirm correctness. |
| 48 | +- **Change Narrator**: Provides human-readable rationales for each transformation. |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## Early Focus Areas |
| 53 | + |
| 54 | +We're concentrating on common structural and readability issues in legacy Java code, such as: |
| 55 | + |
| 56 | +- Hardcoded "magic" values |
| 57 | +- Vague or empty exception handling |
| 58 | +- Large, multi-purpose methods or classes/god objects (Single Responsibility Principle (SRP) violations) |
| 59 | +- Deep nesting and conditional complexity |
| 60 | +- Outdated constructs that could use modern Java features |
| 61 | + |
| 62 | +These patterns are common in real-world codebases and make excellent candidates for safe, incremental improvement. |
| 63 | + |
| 64 | +--- |
| 65 | + |
| 66 | +## Why This Project |
| 67 | + |
| 68 | +Instead of aiming for full automation or rewriting code from scratch, this project focuses on **safe, explainable refactoring**, starting with well-tested code and keeping changes local and understandable. It's meant to help developers gradually improve legacy systems without losing trust in the process or control over the results. |
| 69 | + |
| 70 | +Modern LLMs can generate code, offer suggestions, and mimic stylistic patterns - but they still struggle with: |
| 71 | +- **Long, messy contexts** that span multiple responsibilities or abstractions |
| 72 | +- **Ambiguous control flow**, especially in legacy code with inconsistent structure |
| 73 | +- **Multi-step refactoring logic**, where intermediate intent isn't explicit |
| 74 | +- **Reliability** when small misinterpretations can silently break functionality |
| 75 | + |
| 76 | +Instead of relying on one-shot, monolithic LLM calls, we’re exploring an **agent-based approach**: decomposing the task into smaller, purpose-driven agents that cooperate on detection, transformation, explanation, and validation. |
| 77 | + |
| 78 | +By tightly scoping the system to operate **only on files with full test coverage**, and by enforcing **file-local edits that preserve interfaces**, we create a safer environment for experimentation - and a workflow developers can trust. The focus is on **incremental modernisation**, not magical automation. |
0 commit comments