diff --git a/README.md b/README.md
index c1e34f762..fdb66a053 100644
--- a/README.md
+++ b/README.md
@@ -212,11 +212,19 @@ flowchart TB
     %% MODEL LAYER
     %% ═══════════════════════════════════════════════════════════════════════
     subgraph Models["Model Layer (VLMs)"]
-        direction LR
-        CLAUDE["Claude"]
-        GPT["GPT-4o"]
-        GEMINI["Gemini"]
-        QWEN["Qwen-VL"]
+        direction TB
+        subgraph APIModels["API Models"]
+            direction LR
+            CLAUDE["Claude"]
+            GPT["GPT-4o"]
+            GEMINI["Gemini"]
+        end
+        subgraph OpenSource["Open Source / Fine-tuned"]
+            direction LR
+            QWEN3["Qwen3-VL"]
+            UITARS["UI-TARS"]
+            OPENCUA["OpenCUA"]
+        end
     end
 
     %% ═══════════════════════════════════════════════════════════════════════
diff --git a/docs/architecture-evolution.md b/docs/architecture-evolution.md
index 4426782a1..683229491 100644
--- a/docs/architecture-evolution.md
+++ b/docs/architecture-evolution.md
@@ -1,6 +1,6 @@
 # OpenAdapt Architecture Evolution
 
-**Version**: 2.0
+**Version**: 3.0
 **Date**: January 2026
 **Status**: Living Document
 
@@ -8,131 +8,349 @@
 
 ## Executive Summary
 
-This document synthesizes OpenAdapt's original alpha vision with modern GUI agent state-of-the-art (SOTA) research. It defines the architectural principles, implementation status, and roadmap for OpenAdapt as the leading open-source demonstration-conditioned GUI automation framework.
+This document traces the evolution of OpenAdapt from its original alpha vision through the modern modular implementation, synthesizing state-of-the-art GUI agent research into a unified framework. OpenAdapt's core innovation is **demonstration-conditioned automation**: "show, don't tell."
 
 ---
 
 ## Table of Contents
 
-1. [Core Insight: Demonstration-Conditioned Automation](#1-core-insight-demonstration-conditioned-automation)
+1. [Original Alpha Vision](#1-original-alpha-vision)
 2. [The Abstraction Ladder](#2-the-abstraction-ladder)
-3. [Three-Phase Architecture](#3-three-phase-architecture)
-4. [Package Responsibilities](#4-package-responsibilities)
-5. [Feedback Loops](#5-feedback-loops)
-6. [Model Layer](#6-model-layer)
-7. [Implementation Status](#7-implementation-status)
-8. [Architecture Diagrams](#8-architecture-diagrams)
-9. [Key Design Principles](#9-key-design-principles)
-10. [Research Alignment](#10-research-alignment)
-11. [Future Directions](#11-future-directions)
+3. [Core Innovation: Demo-Conditioned Agents](#3-core-innovation-demo-conditioned-agents)
+4. [Modern Architecture](#4-modern-architecture)
+5. [SOTA GUI Agent Integration](#5-sota-gui-agent-integration)
+6. [Package Responsibilities](#6-package-responsibilities)
+7. [Feedback Loops](#7-feedback-loops)
+8. [Implementation Status](#8-implementation-status)
+9. [Architecture Evolution Diagrams](#9-architecture-evolution-diagrams)
+10. [Future Directions](#10-future-directions)
 
 ---
 
-## 1. Core Insight: Demonstration-Conditioned Automation
+## 1. Original Alpha Vision
 
-### The Fundamental Differentiator
+### The Three-Stage Pipeline (2023)
 
-OpenAdapt's fundamental differentiator is **demonstration-conditioned automation**: "show, don't tell."
+OpenAdapt was conceived as a three-stage pipeline for AI-first process automation:
 
-| Approach | Description | Example |
-|----------|-------------|---------|
-| **Prompt-Driven** (Traditional) | User describes what to do in natural language | "Book a flight from NYC to LA for next Tuesday" |
-| **Demo-Conditioned** (OpenAdapt) | Agent learns from watching user perform the task | Record user booking a flight, replay with new parameters |
+```
++=====================+     +=====================+     +=====================+
+|                     |     |                     |     |                     |
+|      RECORDING      | --> |      ANALYSIS       | --> |       REPLAY        |
+|                     |     |                     |     |                     |
+|  Capture human      |     |  Convert to         |     |  Generate and       |
+|  demonstrations:    |     |  tokenized format   |     |  replay synthetic   |
+|  - Screenshots      |     |  for LMM            |     |  input via model    |
+|  - User input       |     |  processing         |     |  completions        |
+|                     |     |                     |     |                     |
++=====================+     +=====================+     +=====================+
+```
 
-### Why This Matters
+### Original Design Goals
 
-1. **Reduced Ambiguity**: Demonstrations capture implicit knowledge that's hard to verbalize
-2. **Grounded in Reality**: Agents learn from actual UI interactions, not abstract descriptions
-3. **Lower Barrier to Entry**: Users don't need prompt engineering skills
-4. **Validated Improvement**: 33% to 100% first-action accuracy with demo conditioning (internal benchmarks)
+From the legacy README:
 
-### The "Show, Don't Tell" Principle
+> "The goal is similar to that of Robotic Process Automation (RPA), except that we use Large Multimodal Models instead of conventional RPA tools."
 
-```
-Traditional Agent:
-  User: "Click the submit button"
-  Agent: [Which submit button? What context? What state?]
+**Key Differentiators (Alpha)**:
+1. **Model Agnostic** - Works with any LMM
+2. **Auto-Prompted** - Learns from demonstration, not user prompts
+3. **Grounded in Existing Processes** - Mitigates hallucinations
+4. **Universal GUI Support** - Desktop, web, and virtualized (Citrix)
+5. **Open Source** - MIT license
+
+### Legacy Monolithic Implementation
 
-Demo-Conditioned Agent:
-  User: [Records clicking the blue "Submit Order" button after filling form]
-  Agent: [Learns the full context: form state, button appearance, preceding actions]
+The alpha codebase (`legacy/openadapt/`) implemented:
+
+```
+openadapt/
+  record.py           # Screenshot/event capture
+  replay.py           # Strategy-based playback
+  models.py           # Recording, ActionEvent, Screenshot, WindowEvent
+  events.py           # Event aggregation/processing
+  strategies/
+    base.py           # BaseReplayStrategy abstract class
+    naive.py          # Direct literal replay
+    stateful.py       # GPT-4 + OS-level window data
+    vanilla.py        # Full VLM reasoning per step
+    visual.py         # FastSAM segmentation
+    visual_browser.py # DOM-based segments
+  adapters/
+    anthropic.py      # Claude API integration
+    openai.py         # GPT API integration
+    replicate.py      # Open-source model hosting
+  privacy/
+    base.py           # Scrubbing provider interface
+    providers/        # Presidio, AWS Comprehend, Private AI
 ```
 
----
+### The Strategy Pattern (Original)
 
-## 2. The Abstraction Ladder
+The original architecture used a `BaseReplayStrategy` abstract class:
 
-OpenAdapt processes demonstrations through progressive abstraction levels, enabling generalization, transfer learning, and explainability.
+```python
+class BaseReplayStrategy(ABC):
+    """Base class for implementing replay strategies."""
 
-### Abstraction Levels
+    def __init__(self, recording: Recording) -> None:
+        self.recording = recording
+        self.action_events = []
+        self.screenshots = []
+        self.window_events = []
 
+    @abstractmethod
+    def get_next_action_event(
+        self,
+        screenshot: Screenshot,
+        window_event: WindowEvent,
+    ) -> ActionEvent:
+        """Get the next action based on current observation."""
+        pass
+
+    def run(self) -> None:
+        """Execute the replay loop."""
+        while True:
+            screenshot = Screenshot.take_screenshot()
+            window_event = WindowEvent.get_active_window_event()
+            action_event = self.get_next_action_event(screenshot, window_event)
+            if action_event:
+                playback.play_action_event(action_event, ...)
 ```
-Level 0 - LITERAL (Raw Events)
-    { press: "h", press: "i", press: " ", press: "b", press: "o", press: "b" }
 
-         | Reduction (aggregate consecutive events)
-         v
+This pattern evolved into the modern policy/grounding separation.
 
-Level 1 - SYMBOLIC (Semantic Actions)
-    { type: "hi bob" }
+### Alpha Data Model
 
-         | Anonymization (extract parameters)
-         v
+```python
+class Recording:
+    """Container for a demonstration session."""
+    id: int
+    timestamp: float
+    task_description: str
+    action_events: list[ActionEvent]
+    screenshots: list[Screenshot]
+    window_events: list[WindowEvent]
+
+class ActionEvent:
+    """A single user action (click, type, scroll, etc.)."""
+    name: str                    # "click", "type", "scroll", "press", "release"
+    timestamp: float
+    screenshot: Screenshot       # Screenshot just before action
+    window_event: WindowEvent    # Active window state
+    mouse_x, mouse_y: int        # Mouse coordinates
+    key_char, key_name: str      # Keyboard input
+    element_state: dict          # Accessibility info
+
+class Screenshot:
+    """A captured screen image."""
+    timestamp: float
+    png_data: bytes
+    image: PIL.Image
+```
 
-Level 2 - TEMPLATE (Parameterized Actions)
-    { type: "hi <firstname>" }
+---
 
-         | Process Mining (discover patterns)
-         v
+## 2. The Abstraction Ladder
 
-Level 3 - SEMANTIC (Intent Recognition)
-    { greet: user }
+### Core Concept: Progressive Abstraction
 
-         | Goal Composition (high-level planning)
-         v
+OpenAdapt processes demonstrations through ascending levels of abstraction, enabling generalization and transfer learning.
 
-Level 4 - GOAL (Task Specification)
-    "Say hello to the customer"
 ```
++=========================================================================+
+|                                                                         |
+|  Level 4: GOAL (Task Specification)                              FUTURE |
+|  "Say hello to the customer"                                            |
+|                                                                         |
+|           ^                                                             |
+|           | Goal Composition (high-level planning)                      |
+|           |                                                             |
++=========================================================================+
+|                                                                         |
+|  Level 3: SEMANTIC (Intent Recognition)                         FUTURE |
+|  { action: "greet", target: "user" }                                    |
+|                                                                         |
+|           ^                                                             |
+|           | Process Mining (discover patterns)                          |
+|           |                                                             |
++=========================================================================+
+|                                                                         |
+|  Level 2: TEMPLATE (Parameterized Actions)                      PARTIAL |
+|  { type: "hi <firstname>" }                                             |
+|                                                                         |
+|           ^                                                             |
+|           | Anonymization (extract parameters)                          |
+|           |                                                             |
++=========================================================================+
+|                                                                         |
+|  Level 1: SYMBOLIC (Semantic Actions)                       IMPLEMENTED |
+|  { type: "hi bob" }                                                     |
+|                                                                         |
+|           ^                                                             |
+|           | Reduction (aggregate consecutive events)                    |
+|           |                                                             |
++=========================================================================+
+|                                                                         |
+|  Level 0: LITERAL (Raw Events)                              IMPLEMENTED |
+|  { press: "h" }, { press: "i" }, { press: " " }, { press: "b" }, ...    |
+|                                                                         |
++=========================================================================+
+```
+
+### Abstraction Level Details
 
-### Abstraction Benefits
+| Level | Name | Representation | Transformation | Status |
+|-------|------|----------------|----------------|--------|
+| 0 | **Literal** | Raw keypresses, mouse coords | None (raw capture) | **Implemented** |
+| 1 | **Symbolic** | Aggregated actions (`type "hello"`) | Event reduction | **Implemented** |
+| 2 | **Template** | Parameterized (`type "<greeting>"`) | Regex extraction | **Partial** |
+| 3 | **Semantic** | Intent-level (`greet user`) | LLM intent recognition | **Research** |
+| 4 | **Goal** | Task description ("Welcome customer") | Goal composition | **Future** |
+
+### Why Abstraction Matters
 
 | Level | Enables | Example Use Case |
 |-------|---------|------------------|
-| Literal | Exact replay | Debugging, audit trails |
+| Literal | Exact replay, debugging | Audit trails, regression tests |
 | Symbolic | Human-readable logs | Training data visualization |
 | Template | Parameterized replay | Same task, different data |
 | Semantic | Cross-application transfer | Greeting in any messaging app |
 | Goal | Natural language control | "Greet the next customer" |
 
-### Current Implementation Status
+### Current Implementation
+
+**Literal to Symbolic** (`openadapt-capture`):
+- Event aggregation in `events.py`
+- Consecutive keypresses become `type` actions
+- Mouse drags become `drag` actions
+- Click sequences become `doubleclick` or `tripleclick`
 
-- **Literal to Symbolic**: Implemented in `openadapt-capture` (event aggregation)
-- **Symbolic to Template**: Partially implemented (regex-based extraction)
-- **Template to Semantic**: Research stage (LLM-based intent recognition)
-- **Semantic to Goal**: Future work (requires process mining)
+**Symbolic to Template** (Partial):
+- Regex-based parameter extraction
+- User-defined placeholders
+
+**Template to Semantic** (Research):
+- LLM-based intent recognition
+- Pattern library discovery
+
+**Semantic to Goal** (Future):
+- Process mining algorithms
+- Cross-demo pattern extraction
 
 ---
 
-## 3. Three-Phase Architecture
+## 3. Core Innovation: Demo-Conditioned Agents
 
-OpenAdapt operates in three distinct phases, each with dedicated packages and responsibilities.
+### The Fundamental Differentiator
 
-### Phase Overview
+OpenAdapt's core insight is **demonstration-conditioned automation**: "show, don't tell."
 
 ```
-+------------------+     +------------------+     +------------------+
-|                  |     |                  |     |                  |
-|    DEMONSTRATE   | --> |      LEARN       | --> |     EXECUTE      |
-|                  |     |                  |     |                  |
-|  (Observation    |     |  (Policy         |     |  (Agent          |
-|   Collection)    |     |   Acquisition)   |     |   Deployment)    |
-|                  |     |                  |     |                  |
-+------------------+     +------------------+     +------------------+
++-------------------------------------------------------------------+
+|                     TRADITIONAL APPROACH                           |
++-------------------------------------------------------------------+
+|                                                                    |
+|  User: "Click the submit button"                                   |
+|                                                                    |
+|  Agent: [Which submit button? What context? What state?]           |
+|         [Multiple submit buttons on page?]                         |
+|         [Different applications have different buttons]            |
+|                                                                    |
+|  Result: AMBIGUOUS -> Requires prompt engineering                  |
+|                                                                    |
++-------------------------------------------------------------------+
+
++-------------------------------------------------------------------+
+|                    DEMO-CONDITIONED APPROACH                       |
++-------------------------------------------------------------------+
+|                                                                    |
+|  User: [Records clicking the blue "Submit Order" button            |
+|         after filling out form fields]                             |
+|                                                                    |
+|  Agent: [Learns full context:                                      |
+|          - Form state before action                                |
+|          - Button appearance and location                          |
+|          - Preceding actions in sequence                           |
+|          - Window/application context]                             |
+|                                                                    |
+|  Result: GROUNDED -> No prompt engineering needed                  |
+|                                                                    |
++-------------------------------------------------------------------+
+```
+
+### Why Demo-Conditioning Works
+
+1. **Captures Implicit Knowledge**: Users demonstrate things they can't easily verbalize
+2. **Grounded in Reality**: Actions tied to actual UI states, not abstract descriptions
+3. **Reduces Ambiguity**: Visual context eliminates interpretation errors
+4. **Lower Barrier**: No prompt engineering skills required
+
+### Empirical Results
+
+Demo conditioning improves first-action accuracy:
+
+| Approach | First-Action Accuracy | Notes |
+|----------|----------------------|-------|
+| Prompt-only | ~33% | Ambiguity in action selection |
+| Demo-conditioned | ~100% | Full context from demonstration |
+
+### The "Show, Don't Tell" Principle
+
+```python
+# Traditional: Prompt-driven
+agent.execute("Click the submit button")
+# -> Which submit button? What state? What context?
+
+# Demo-Conditioned: Demonstration-driven
+demo = capture_demonstration()  # User clicks specific submit button
+agent = train_policy(demo)      # Agent learns the full context
+agent.execute(new_context)      # Agent adapts to variations
 ```
 
 ---
 
+## 4. Modern Architecture
+
+### Evolution: Monolith to Meta-Package
+
+```
+ALPHA (2023-2024)                        MODERN (2025+)
++====================+                   +====================+
+|                    |                   |     openadapt      |
+|     openadapt      |                   |    (meta-pkg)      |
+|    (monolithic)    |                   +=========+=========+
+|                    |                             |
+|  - record.py       |           +-----------------+-----------------+
+|  - replay.py       |           |         |       |       |         |
+|  - strategies/     |      +----+----+ +--+--+ +--+--+ +--+--+ +----+----+
+|  - models.py       |      |capture  | | ml  | |evals| |viewer| |optional |
+|  - adapters/       |      +---------+ +-----+ +-----+ +------+ +---------+
+|  - privacy/        |
+|  - visualize.py    |      + grounding, retrieval, privacy
+|                    |
++====================+
+```
+
+### The Modern Three-Phase Architecture
+
+Building on the alpha vision, the modern architecture formalizes three phases:
+
+```
++=======================+     +=======================+     +=======================+
+||                     ||     ||                     ||     ||                     ||
+||     DEMONSTRATE     || --> ||       LEARN         || --> ||      EXECUTE        ||
+||                     ||     ||                     ||     ||                     ||
+||  (Observation       ||     ||  (Policy            ||     ||  (Agent             ||
+||   Collection)       ||     ||   Acquisition)      ||     ||   Deployment)       ||
+||                     ||     ||                     ||     ||                     ||
+||  Packages:          ||     ||  Packages:          ||     ||  Packages:          ||
+||  - capture          ||     ||  - ml               ||     ||  - evals            ||
+||  - privacy          ||     ||  - retrieval        ||     ||  - grounding        ||
+||                     ||     ||                     ||     ||                     ||
++=======================+     +=======================+     +=======================+
+```
+
 ### Phase 1: DEMONSTRATE (Observation Collection)
 
 **Purpose**: Capture rich trajectories from human demonstrations.
@@ -148,672 +366,445 @@ OpenAdapt operates in three distinct phases, each with dedicated packages and re
 - Window metadata (title, bounds, process)
 - Audio transcription (optional)
 
-**Privacy Integration**:
-- Optional PII/PHI scrubbing before storage
-- Configurable redaction levels
-
-**Storage Format**:
-- JSON for metadata and events
-- Parquet for efficient batch access
-- PNG/JPEG for screenshots
-
 **Packages**: `openadapt-capture`, `openadapt-privacy`
 
----
-
 ### Phase 2: LEARN (Policy Acquisition)
 
 **Purpose**: Transform demonstrations into executable agent policies.
 
 **Three Learning Paths**:
 
-#### Path A: Retrieval-Augmented Prompting
-- Index demonstrations in vector database
-- At inference, retrieve similar demos as context
-- Condition API agent (Claude, GPT, Gemini) on retrieved examples
-- **Advantage**: Works with any VLM, no training required
-- **Package**: `openadapt-retrieval`
-
-#### Path B: Fine-Tuning
-- Train/fine-tune VLM on demonstration dataset
-- Use LoRA for parameter-efficient training
-- Deploy locally or via inference API
-- **Advantage**: Specialized performance, privacy, lower inference cost
-- **Package**: `openadapt-ml`
-
-#### Path C: Process Mining
-- Extract reusable action patterns across demonstrations
-- Build abstraction hierarchy (template, semantic, goal)
-- Enable cross-task transfer learning
-- **Status**: Research/Future
-- **Package**: `openadapt-ml` (future)
-
-**Outputs**:
-- Vector embeddings for retrieval
-- Model checkpoints for fine-tuned models
-- Process graphs for abstraction (future)
-
----
+| Path | Mechanism | Advantage | Package |
+|------|-----------|-----------|---------|
+| **A: Retrieval-Augmented** | Index demos, retrieve similar | No training needed | `openadapt-retrieval` |
+| **B: Fine-Tuning** | Train VLM on demo dataset | Specialized performance | `openadapt-ml` |
+| **C: Process Mining** | Extract reusable patterns | Cross-task transfer | `openadapt-ml` (future) |
 
 ### Phase 3: EXECUTE (Agent Deployment)
 
-**Purpose**: Run trained/conditioned agents to perform tasks autonomously.
+**Purpose**: Run trained/conditioned agents autonomously.
 
 **Execution Loop**:
-
 ```
 while not task_complete:
-    1. OBSERVE
-       - Capture current screenshot
-       - Extract accessibility tree
-       - Build observation state
-
-    2. GROUND
-       - Localize UI elements (bounding boxes)
-       - Apply Set-of-Mark (SoM) annotation
-       - Map elements to coordinates or IDs
-
-    3. PLAN
-       - Encode observation with VLM
-       - Condition on goal + history + retrieved demos
-       - Generate action prediction
-
-    4. ACT
-       - Parse action (click, type, scroll, etc.)
-       - Execute via input synthesis
-       - Record action for history
-
-    5. EVALUATE
-       - Check for success indicators
-       - Detect failure patterns
-       - Decide: continue, retry, or escalate
+    1. OBSERVE    - Capture screenshot + a11y tree
+    2. GROUND     - Localize UI elements (SoM, OmniParser)
+    3. PLAN       - VLM reasoning with demo context
+    4. ACT        - Execute via input synthesis
+    5. EVALUATE   - Check success, decide next step
 ```
 
-**Grounding Modes**:
-
-| Mode | Description | Accuracy | Use Case |
-|------|-------------|----------|----------|
-| **Direct** | VLM predicts raw (x, y) coordinates | Variable | Simple, fast |
-| **Set-of-Mark (SoM)** | UI elements labeled with IDs, VLM selects ID | High | Complex UIs |
-| **Hybrid** | SoM for elements, Direct for fine positioning | Highest | Production |
-
-**Packages**: `openadapt-grounding`, `openadapt-evals`, `openadapt-ml`
+**Packages**: `openadapt-evals`, `openadapt-grounding`, `openadapt-ml`
 
 ---
 
-## 4. Package Responsibilities
+## 5. SOTA GUI Agent Integration
 
-### Core Packages
+### Policy/Grounding Separation
 
-| Package | Phase | Responsibility | Key Exports |
-|---------|-------|----------------|-------------|
-| `openadapt-capture` | DEMONSTRATE | GUI recording, event capture, storage | `Recorder`, `CaptureSession`, `Action`, `Screenshot` |
-| `openadapt-ml` | LEARN | Model training, inference, adapters | `Trainer`, `AgentPolicy`, `VLMAdapter` |
-| `openadapt-evals` | EXECUTE | Benchmark evaluation, metrics | `BenchmarkAdapter`, `ApiAgent`, `evaluate_agent` |
-| `openadapt-viewer` | Cross-cutting | HTML visualization, replay | `PageBuilder`, `HTMLBuilder`, `TrajectoryViewer` |
-
-### Optional Packages
-
-| Package | Phase | Responsibility | Key Exports |
-|---------|-------|----------------|-------------|
-| `openadapt-grounding` | EXECUTE | UI element localization | `OmniParser`, `Florence2`, `GeminiGrounder` |
-| `openadapt-retrieval` | LEARN | Multimodal demo search | `DemoRetriever`, `VectorIndex`, `Embedder` |
-| `openadapt-privacy` | DEMONSTRATE | PII/PHI scrubbing | `Scrubber`, `Redactor`, `PrivacyFilter` |
-
-### Package Dependency Matrix
+From Claude Computer Use, UFO, and SeeAct research:
 
 ```
-                    capture  ml  evals  viewer  grounding  retrieval  privacy
-openadapt-capture      -     -    -       -        -          -          O
-openadapt-ml           R     -    -       -        O          O          -
-openadapt-evals        -     R    -       O        O          O          -
-openadapt-viewer       O     O    O       -        -          -          O
-openadapt-grounding    -     -    -       -        -          -          -
-openadapt-retrieval    R     -    -       -        -          -          -
-openadapt-privacy      -     -    -       -        -          -          -
-
-Legend: R = Required, O = Optional, - = None
++====================+          +====================+
+|                    |          |                    |
+|      POLICY        |    -->   |     GROUNDING      |
+|                    |          |                    |
+|   "What to do"     |          |   "Where to do"    |
+|                    |          |                    |
+| - Observation      |          | - Element          |
+|   encoding         |          |   detection        |
+| - Action           |          | - Coordinate       |
+|   selection        |          |   mapping          |
+| - History          |          | - Bounding         |
+|   context          |          |   boxes            |
+|                    |          |                    |
++====================+          +====================+
 ```
 
----
-
-## 5. Feedback Loops
+**OpenAdapt Implementation**:
+- **Policy**: `openadapt-ml` adapters (Claude, GPT-4V, Qwen-VL)
+- **Grounding**: `openadapt-grounding` providers (OmniParser, Florence2, Gemini)
 
-OpenAdapt implements continuous improvement through three feedback loops.
+### Set-of-Mark (SoM) Prompting
 
-### System Diagram
+From Microsoft's Set-of-Mark paper:
 
 ```
-                          DEMONSTRATE
-                               |
-                               | Human demonstrations
-                               v
-+--------------------------> LEARN <--------------------------+
-|                              |                               |
-|                              | Trained policies              |
-|   +--------------------------|---------------------+         |
-|   |                          v                     |         |
-|   |   +----------------> EXECUTE <--------------+  |         |
-|   |   |                      |                  |  |         |
-|   |   | Retry on             | Success/Failure  |  |         |
-|   |   | recoverable          | outcomes         |  |         |
-|   |   | errors               v                  |  |         |
-|   |   |              +-------+-------+          |  |         |
-|   |   |              |               |          |  |         |
-|   |   +--------------+   EVALUATE    +----------+  |         |
-|   |                  |               |             |         |
-|   |                  +-------+-------+             |         |
-|   |                          |                     |         |
-|   |                          | Execution traces    |         |
-|   |                          v                     |         |
-|   |                  Demo library grows            |         |
-|   |                          |                     |         |
-|   +--------------------------+                     |         |
-|                                                    |         |
-|   Failure analysis identifies gaps                 |         |
-|                    |                               |         |
-|                    v                               |         |
-|           New demonstrations                       |         |
-|                    |                               |         |
-+--------------------+                               |         |
-                                                     |         |
-                     Self-improvement loop           |         |
-                     (execution traces -> training)  |         |
-                              |                      |         |
-                              +----------------------+         |
-                                                               |
-             Benchmark-driven development                      |
-             (eval results -> architecture improvements)       |
-                              |                                |
-                              +--------------------------------+
+Original Screenshot              SoM-Annotated Screenshot
++---------------------+          +---------------------+
+|  [Login]   [Help]   |          |   [1]      [2]      |
+|                     |    ->    |                     |
+|  Email: [________]  |          |  Email: [3]         |
+|  Pass:  [________]  |          |  Pass:  [4]         |
+|         [Submit]    |          |         [5]         |
++---------------------+          +---------------------+
+
+Prompt: "Enter email in element [3], password in [4], click [5]"
 ```
 
-### Loop Details
-
-#### Loop 1: Demonstration Library Growth
-- Successful executions are stored as new demonstrations
-- Failed executions trigger gap analysis
-- Human reviews and corrects failures
-- Corrections become new training data
+**OpenAdapt Implementation**: `openadapt-grounding.SoMPrompt`
 
-#### Loop 2: Self-Improvement (Future)
-- Agent traces its own execution
-- Successful traces fine-tune the policy
-- Automatic curriculum: easy to hard tasks
-- Reduces need for human demonstrations over time
+### Safety Gates
 
-#### Loop 3: Benchmark-Driven Development
-- Regular evaluation on standard benchmarks
-- Failure modes inform architecture changes
-- New capabilities tested before merge
-- Regression detection prevents quality drops
-
----
+From responsible AI patterns:
 
-## 6. Model Layer
-
-OpenAdapt is model-agnostic, supporting multiple foundation models through a unified adapter interface.
+```
++------------------+     +------------------+     +------------------+
+|                  |     |                  |     |                  |
+|     OBSERVE      | --> |    VALIDATE      | --> |      ACT         |
+|                  |     |                  |     |                  |
+|  Get current     |     |  - Check bounds  |     |  Execute if      |
+|  state           |     |  - Verify perms  |     |  validated       |
+|                  |     |  - Rate limit    |     |                  |
++------------------+     +--------+---------+     +------------------+
+                                 |
+                                 v (rejected)
+                         +------------------+
+                         |    ESCALATE      |
+                         |  Human review    |
+                         +------------------+
+```
 
-### Supported Models
+**Status**: Planned in `openadapt-evals` safety module.
 
-#### API Providers (Cloud)
+### Research Alignment
 
-| Provider | Model | Status | Best For |
-|----------|-------|--------|----------|
-| Anthropic | Claude 3.5 Sonnet | Implemented | General GUI tasks |
-| OpenAI | GPT-4o | Implemented | Complex reasoning |
-| Google | Gemini 2.0 Flash | Implemented | Cost-efficient |
+| Research Paper | Key Contribution | OpenAdapt Integration |
+|----------------|------------------|----------------------|
+| **Claude Computer Use** (Anthropic, 2024) | Production VLM agent API | API adapter in `openadapt-ml` |
+| **UFO** (Microsoft, 2024) | Windows agent architecture | Prompt patterns adopted |
+| **OSWorld** (CMU, 2024) | Cross-platform benchmark | Benchmark adapter planned |
+| **Set-of-Mark** (Microsoft, 2023) | Visual grounding via labels | Core grounding mode |
+| **OmniParser** (Microsoft, 2024) | Pure-vision UI parsing | Provider in `openadapt-grounding` |
+| **SeeAct** (OSU, 2024) | Grounded action generation | Action space design |
+| **WebArena** (CMU, 2023) | Web automation benchmark | Benchmark adapter implemented |
+| **AppAgent** (Tencent, 2024) | Mobile GUI agent | Mobile support planned |
 
-#### Local Models (Self-Hosted)
+---
 
-| Model | Parameters | Status | Best For |
-|-------|------------|--------|----------|
-| Qwen2-VL | 2B-72B | Implemented | Fine-tuning, privacy |
-| Qwen2.5-VL | 3B-72B | Planned | Next-gen local |
-| Molmo | 7B | Research | Efficiency |
+## 6. Package Responsibilities
 
-### Adapter Interface
+### Package-to-Phase Mapping
 
-```python
-class VLMAdapter(Protocol):
-    """Protocol for VLM model adapters."""
-
-    def predict(
-        self,
-        screenshot: Image,
-        task: str,
-        history: list[Action],
-        context: Optional[list[Demo]] = None,
-    ) -> Action:
-        """Predict next action given observation."""
-        ...
-
-    def get_grounding(
-        self,
-        screenshot: Image,
-        element_description: str,
-    ) -> BoundingBox:
-        """Ground element description to coordinates."""
-        ...
 ```
-
-### Prompt Architecture
-
-OpenAdapt uses a structured prompting approach combining SOTA patterns:
-
++===============================================================================+
+|                           DEMONSTRATE PHASE                                    |
++===============================================================================+
+| Package           | Responsibility             | Key Exports                  |
++-------------------+----------------------------+------------------------------+
+| openadapt-capture | GUI recording, storage     | Recorder, CaptureSession     |
+|                   |                            | Action, Screenshot, Trajectory|
++-------------------+----------------------------+------------------------------+
+| openadapt-privacy | PII/PHI scrubbing          | Scrubber, Redactor           |
+|                   | (integrates at capture)    | PrivacyFilter                |
++===============================================================================+
+
++===============================================================================+
+|                             LEARN PHASE                                        |
++===============================================================================+
+| Package             | Responsibility           | Key Exports                  |
++---------------------+--------------------------+------------------------------+
+| openadapt-ml        | Model training,          | Trainer, AgentPolicy         |
+|                     | inference, adapters      | QwenVLAdapter, ClaudeAdapter |
++---------------------+--------------------------+------------------------------+
+| openadapt-retrieval | Demo embedding,          | DemoIndex, Embedder          |
+|                     | similarity search        | SearchResult                 |
++===============================================================================+
+
++===============================================================================+
+|                            EXECUTE PHASE                                       |
++===============================================================================+
+| Package              | Responsibility          | Key Exports                  |
++----------------------+-------------------------+------------------------------+
+| openadapt-evals      | Benchmark evaluation,   | BenchmarkAdapter, ApiAgent   |
+|                      | metrics collection      | evaluate_agent_on_benchmark  |
++----------------------+-------------------------+------------------------------+
+| openadapt-grounding  | UI element detection,   | ElementDetector, SoMPrompt   |
+|                      | coordinate mapping      | OmniParser, GeminiGrounder   |
++===============================================================================+
+
++===============================================================================+
+|                           CROSS-CUTTING                                        |
++===============================================================================+
+| Package           | Responsibility             | Key Exports                  |
++-------------------+----------------------------+------------------------------+
+| openadapt-viewer  | HTML visualization,        | PageBuilder, HTMLBuilder     |
+|                   | trajectory replay          | TrajectoryViewer             |
++-------------------+----------------------------+------------------------------+
+| openadapt         | Unified CLI,               | cli.main, lazy imports       |
+| (meta-package)    | dependency coordination    |                              |
++===============================================================================+
 ```
-SYSTEM: {role_definition}
-
-CONTEXT:
-- Retrieved demonstrations (if available)
-- Task description
-- Success criteria
 
-OBSERVATION:
-- Current screenshot (base64 or URL)
-- Accessibility tree (structured)
-- Element annotations (Set-of-Mark)
-
-HISTORY:
-- Previous N actions and their outcomes
-- Current step number
+### Package Dependency Matrix
 
-INSTRUCTION:
-- Action space definition
-- Output format specification
+```
+                    capture  ml  evals  viewer  grounding  retrieval  privacy
+openadapt-capture      -     -    -       -        -          -          O
+openadapt-ml           R     -    -       -        O          O          -
+openadapt-evals        -     R    -       O        O          O          -
+openadapt-viewer       O     O    O       -        -          -          O
+openadapt-grounding    -     -    -       -        -          -          -
+openadapt-retrieval    R     -    -       -        -          -          -
+openadapt-privacy      -     -    -       -        -          -          -
 
-USER: What action should be taken next?
+Legend: R = Required, O = Optional, - = None
 ```
 
 ---
 
-## 7. Implementation Status
+## 7. Feedback Loops
 
-### Status Legend
-
-| Symbol | Meaning |
-|--------|---------|
-| Solid | Implemented and tested |
-| Dashed | In progress or partial |
-| Dotted | Planned/Future |
-
-### Component Status Matrix
+### System-Level Feedback Architecture
 
 ```
-+----------------------+------------------+------------------+
-| Component            | Status           | Package          |
-+----------------------+------------------+------------------+
-| DEMONSTRATE PHASE                                          |
-+----------------------+------------------+------------------+
-| Screen capture       | Solid            | capture          |
-| Event recording      | Solid            | capture          |
-| A11y tree capture    | Solid            | capture          |
-| Audio transcription  | Dashed           | capture          |
-| Privacy scrubbing    | Solid            | privacy          |
-| Demo library storage | Solid            | capture          |
-+----------------------+------------------+------------------+
-| LEARN PHASE                                                |
-+----------------------+------------------+------------------+
-| Demo embedding       | Solid            | retrieval        |
-| Vector indexing      | Solid            | retrieval        |
-| Similarity search    | Solid            | retrieval        |
-| API model adapters   | Solid            | ml               |
-| Training pipeline    | Dashed           | ml               |
-| LoRA fine-tuning     | Dashed           | ml               |
-| Process mining       | Dotted           | ml (future)      |
-+----------------------+------------------+------------------+
-| EXECUTE PHASE                                              |
-+----------------------+------------------+------------------+
-| Action execution     | Solid            | capture          |
-| Direct grounding     | Solid            | grounding        |
-| SoM grounding        | Solid            | grounding        |
-| OmniParser provider  | Solid            | grounding        |
-| Florence2 provider   | Solid            | grounding        |
-| Gemini grounding     | Solid            | grounding        |
-| WAA benchmark        | Solid            | evals            |
-| WebArena benchmark   | Dashed           | evals            |
-| OSWorld benchmark    | Dotted           | evals            |
-| Mock benchmark       | Solid            | evals            |
-+----------------------+------------------+------------------+
-| CROSS-CUTTING                                              |
-+----------------------+------------------+------------------+
-| Viewer HTML output   | Solid            | viewer           |
-| Trajectory replay    | Solid            | viewer           |
-| Training dashboard   | Dashed           | viewer           |
-| Benchmark viewer     | Dashed           | viewer           |
-| Telemetry            | Dotted           | telemetry (new)  |
-+----------------------+------------------+------------------+
+                              DEMONSTRATE
+                                   |
+                                   | Human demonstrations
+                                   v
++-----------------------------> LEARN <----------------------------+
+|                                  |                               |
+|                                  | Trained policies              |
+|   +-----------------------------|---------------------+          |
+|   |                             v                     |          |
+|   |   +----------------->  EXECUTE  <--------------+  |          |
+|   |   |                       |                    |  |          |
+|   |   | Retry on              | Success/Failure   |  |          |
+|   |   | recoverable           | outcomes          |  |          |
+|   |   | errors                v                   |  |          |
+|   |   |               +-------+-------+           |  |          |
+|   |   |               |               |           |  |          |
+|   |   +---------------+   EVALUATE    +-----------+  |          |
+|   |   (Loop 1: Retry) |               |              |          |
+|   |                   +-------+-------+              |          |
+|   |                           |                      |          |
+|   |                           | Execution traces     |          |
+|   |                           v                      |          |
+|   |                   Demo library grows             |          |
+|   |                           |                      |          |
+|   +---------------------------+                      |          |
+|   (Loop 2: Library Growth)                           |          |
+|                                                      |          |
+|   Failure analysis identifies gaps                   |          |
+|                    |                                 |          |
+|                    v                                 |          |
+|           Human correction                           |          |
+|                    |                                 |          |
++--------------------+                                 |          |
+(Loop 3: Human-in-Loop)                                |          |
+                                                       |          |
+                     Self-improvement loop             |          |
+                     (execution traces -> training)    |          |
+                              |                        |          |
+                              +------------------------+          |
+                              (Loop 4: Self-Improvement)          |
+                                                                  |
+             Benchmark-driven development                         |
+             (eval results -> architecture improvements)          |
+                              |                                   |
+                              +-----------------------------------+
+                              (Loop 5: Benchmark-Driven)
 ```
 
-### Priority Roadmap
-
-#### P0 - This Week
-- [x] Capture package with Recorder
-- [x] Retrieval with embedding and search
-- [x] Evals with WAA benchmark + mock
-- [x] Grounding providers (OmniParser, Florence, Gemini)
-- [x] Viewer component library
-- [x] API baselines (Claude, GPT, Gemini)
-- [ ] PyPI releases for all packages
-- [ ] WAA baseline metrics
-
-#### P1 - Next 2 Weeks
-- [ ] Fine-tuning pipeline validation
-- [ ] Demo conditioning integration in evals
-- [ ] Multi-track evaluation (Direct, ReAct, SoM)
-- [ ] docs.openadapt.ai launch
-
-#### P2 - This Month
-- [ ] Training dashboard in viewer
-- [ ] WebArena benchmark integration
-- [ ] Cloud GPU training (Lambda Labs)
-- [ ] v1.0.0 meta-package release
-
-#### P3 - Future
-- [ ] Process mining / abstraction
-- [ ] Self-improvement from execution traces
-- [ ] Multi-agent collaboration
-- [ ] Active learning with human feedback
-- [ ] OSWorld benchmark integration
-
----
-
-## 8. Architecture Diagrams
-
-### Master Architecture Diagram (Evolved)
-
-This diagram synthesizes the three-phase pipeline with all key concepts: demo-conditioned prompting, policy/grounding separation, safety gate, multi-source data ingestion, the abstraction ladder, and evaluation-driven feedback loops.
-
-```mermaid
-flowchart TB
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% USER LAYER
-    %% ═══════════════════════════════════════════════════════════════════════
-    subgraph UserLayer["User Layer"]
-        CLI["openadapt CLI"]
-        UI["Desktop/Web GUI"]
-    end
-
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% MULTI-SOURCE DATA INGESTION
-    %% ═══════════════════════════════════════════════════════════════════════
-    subgraph DataSources["Multi-Source Data Ingestion"]
-        direction LR
-        HUMAN["Human<br/>Demonstrations"]
-        SYNTH["Synthetic<br/>Data"]:::future
-        BENCH_DATA["Benchmark<br/>Tasks"]
-        EXTERNAL["External<br/>Datasets"]:::future
-    end
-
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% PHASE 1: DEMONSTRATE (Observation Collection)
-    %% ═══════════════════════════════════════════════════════════════════════
-    subgraph Phase1["DEMONSTRATE (Observation Collection)"]
-        direction TB
-
-        subgraph CaptureLayer["Capture"]
-            REC["Recorder<br/>openadapt-capture"]
-            A11Y["A11y Tree"]
-            SCREENSHOT["Screenshots"]
-            EVENTS["Input Events"]
-
-            REC --> A11Y
-            REC --> SCREENSHOT
-            REC --> EVENTS
-        end
-
-        subgraph PrivacyLayer["Privacy"]
-            SCRUB["Scrubber<br/>openadapt-privacy"]
-            REDACT["PII/PHI Redaction"]
-            SCRUB --> REDACT
-        end
-
-        STORE[("Demo Library<br/>(JSON/Parquet)")]
-
-        A11Y --> SCRUB
-        SCREENSHOT --> SCRUB
-        EVENTS --> SCRUB
-        REDACT --> STORE
-    end
-
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% PHASE 2: LEARN (Policy Acquisition)
-    %% ═══════════════════════════════════════════════════════════════════════
-    subgraph Phase2["LEARN (Policy Acquisition)"]
-        direction TB
-
-        subgraph RetrievalPath["Path A: Retrieval-Augmented Prompting"]
-            EMB["Embedder<br/>openadapt-retrieval"]
-            IDX[("Vector Index")]
-            SEARCH["Similarity Search"]
-
-            EMB --> IDX
-            IDX --> SEARCH
-        end
-
-        subgraph TrainingPath["Path B: Fine-Tuning"]
-            LOADER["Data Loader"]
-            TRAINER["Model Trainer<br/>openadapt-ml"]
-            LORA["LoRA Adapters"]
-            CKPT[("Model Checkpoints")]
-
-            LOADER --> TRAINER
-            TRAINER --> LORA
-            LORA --> CKPT
-        end
-
-        subgraph MiningPath["Path C: Process Mining"]:::futureBlock
-            ABSTRACT["Abstractor"]:::future
-            PATTERNS["Pattern Library"]:::future
-
-            ABSTRACT --> PATTERNS
-        end
-    end
-
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% PHASE 3: EXECUTE (Agent Deployment)
-    %% ═══════════════════════════════════════════════════════════════════════
-    subgraph Phase3["EXECUTE (Agent Deployment)"]
-        direction TB
-
-        subgraph AgentLoop["Agent Execution Loop"]
-            OBS["1. OBSERVE<br/>(Screenshot + A11y)"]
-            GROUND["2. GROUND<br/>openadapt-grounding"]
-            PLAN["3. PLAN<br/>(Demo-Conditioned Policy)"]
-            ACT["4. ACT<br/>(Input Synthesis)"]
-
-            OBS --> GROUND
-            GROUND --> PLAN
-            PLAN --> ACT
-        end
-
-        subgraph SafetyGate["Safety Gate (Runtime Layer)"]
-            VALIDATE["Action Validation"]
-            RISK["Risk Assessment"]
-            CONFIRM["Human Confirm"]:::future
-
-            VALIDATE --> RISK
-            RISK --> CONFIRM
-        end
-
-        subgraph Evaluation["Evaluation"]
-            EVALS["Benchmark Runner<br/>openadapt-evals"]
-            METRICS["Metrics<br/>(Success, Steps, Time)"]
-            COMPARE["Model Comparison"]
-
-            EVALS --> METRICS
-            METRICS --> COMPARE
-        end
-
-        ACT --> VALIDATE
-        CONFIRM --> EVALS
-    end
+### Feedback Loop Details
 
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% THE ABSTRACTION LADDER
-    %% ═══════════════════════════════════════════════════════════════════════
-    subgraph AbstractionLadder["The Abstraction Ladder"]
-        direction TB
-        L0["Level 0: LITERAL<br/>(Raw Events)<br/>{ press: 'h', press: 'i' }"]
-        L1["Level 1: SYMBOLIC<br/>(Semantic Actions)<br/>{ type: 'hi bob' }"]
-        L2["Level 2: TEMPLATE<br/>(Parameterized)<br/>{ type: 'hi &lt;name&gt;' }"]
-        L3["Level 3: SEMANTIC<br/>(Intent Recognition)<br/>{ greet: user }"]:::future
-        L4["Level 4: GOAL<br/>(Task Specification)<br/>'Greet customer'"]:::future
-
-        L0 -->|"Reduction"| L1
-        L1 -->|"Anonymization"| L2
-        L2 -.->|"Process Mining"| L3
-        L3 -.->|"Goal Composition"| L4
-    end
-
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% MODEL LAYER (VLM Adapters)
-    %% ═══════════════════════════════════════════════════════════════════════
-    subgraph Models["Model Layer (VLM Adapters)"]
-        direction LR
-
-        subgraph CloudModels["Cloud APIs"]
-            CLAUDE["Claude 3.5"]
-            GPT["GPT-4o"]
-            GEMINI["Gemini 2.0"]
-        end
-
-        subgraph LocalModels["Local Models"]
-            QWEN["Qwen2-VL"]
-            CUSTOM["Custom Fine-tuned"]
-        end
-    end
-
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% VIEWER (Cross-Cutting)
-    %% ═══════════════════════════════════════════════════════════════════════
-    subgraph Viewer["Cross-Cutting: Viewer"]
-        VIZ["Trajectory<br/>Visualization"]
-        REPLAY["Demo<br/>Replay"]
-        DASH["Training<br/>Dashboard"]:::partialImpl
-    end
+| Loop | Name | Trigger | Outcome | Status |
+|------|------|---------|---------|--------|
+| 1 | **Retry** | Recoverable error | Re-attempt action | **Implemented** |
+| 2 | **Library Growth** | Successful execution | New demo added | **Implemented** |
+| 3 | **Human-in-Loop** | Unrecoverable failure | Human correction -> demo | **Implemented** |
+| 4 | **Self-Improvement** | Execution traces | Fine-tuning | **Research** |
+| 5 | **Benchmark-Driven** | Eval metrics | Architecture changes | **Active** |
 
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% DATA FLOW CONNECTIONS
-    %% ═══════════════════════════════════════════════════════════════════════
-
-    %% User interactions
-    CLI --> REC
-    UI --> REC
-    CLI --> TRAINER
-    CLI --> EVALS
-
-    %% Multi-source ingestion
-    HUMAN --> REC
-    SYNTH -.-> LOADER
-    BENCH_DATA --> EVALS
-    EXTERNAL -.-> LOADER
+---
 
-    %% Demo flow to learning
-    STORE --> EMB
-    STORE --> LOADER
-    STORE -.-> ABSTRACT
-
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% DEMO-CONDITIONED PROMPTING (Core Innovation)
-    %% Retrieval used in BOTH training AND evaluation
-    %% ═══════════════════════════════════════════════════════════════════════
-    SEARCH -->|"demo context<br/>(training)"| PLAN
-    SEARCH -->|"demo context<br/>(evaluation)"| EVALS
-    CKPT -->|"trained policy"| PLAN
-    PATTERNS -.->|"templates"| PLAN
-
-    %% Model connections (Policy/Grounding Separation)
-    PLAN -->|"action prediction"| Models
-    GROUND -->|"element localization"| Models
-
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% EVALUATION-DRIVEN FEEDBACK LOOPS
-    %% ═══════════════════════════════════════════════════════════════════════
-    METRICS -->|"success traces<br/>(new demos)"| STORE
-    METRICS -.->|"training signal<br/>(self-improvement)"| TRAINER
-    COMPARE -->|"failure analysis"| UserLayer
+## 8. Implementation Status
 
-    %% Viewer connections
-    STORE -.-> VIZ
-    STORE -.-> REPLAY
-    CKPT -.-> DASH
-    METRICS -.-> DASH
+### What's Implemented vs Future Work
 
-    %% ═══════════════════════════════════════════════════════════════════════
-    %% STYLING
-    %% ═══════════════════════════════════════════════════════════════════════
-
-    %% Layer colors
-    classDef userLayer fill:#E74C3C,stroke:#A93226,color:#fff
-    classDef dataSource fill:#16A085,stroke:#0E6655,color:#fff
-    classDef phase1 fill:#3498DB,stroke:#1A5276,color:#fff
-    classDef phase2 fill:#27AE60,stroke:#1E8449,color:#fff
-    classDef phase3 fill:#9B59B6,stroke:#6C3483,color:#fff
-    classDef models fill:#F39C12,stroke:#B7950B,color:#fff
-    classDef viewer fill:#1ABC9C,stroke:#148F77,color:#fff
-    classDef safetyGate fill:#E74C3C,stroke:#922B21,color:#fff
-
-    %% Implementation status
-    classDef implemented fill:#2ECC71,stroke:#1E8449,color:#fff
-    classDef partialImpl fill:#F4D03F,stroke:#B7950B,color:#000
-    classDef future fill:#95A5A6,stroke:#707B7C,color:#fff,stroke-dasharray: 5 5
-    classDef futureBlock fill:#EAECEE,stroke:#95A5A6,stroke-dasharray: 5 5
-
-    %% Apply layer styles
-    class CLI,UI userLayer
-    class HUMAN,BENCH_DATA dataSource
-    class REC,A11Y,SCREENSHOT,EVENTS,SCRUB,REDACT,STORE phase1
-    class EMB,IDX,SEARCH,LOADER,TRAINER,LORA,CKPT phase2
-    class OBS,GROUND,PLAN,ACT,VALIDATE,RISK,EVALS,METRICS,COMPARE phase3
-    class CLAUDE,GPT,GEMINI,QWEN,CUSTOM models
-    class VIZ,REPLAY viewer
-
-    %% Apply abstraction ladder styles (implemented vs future)
-    class L0,L1,L2 implemented
+```
++==============================================================================+
+|                    IMPLEMENTED (Solid)                                        |
++==============================================================================+
+| Component                | Package          | Notes                          |
++--------------------------+------------------+--------------------------------+
+| Screen capture           | capture          | macOS, Windows, Linux          |
+| Event recording          | capture          | Mouse, keyboard, touch         |
+| Event aggregation        | capture          | Literal -> Symbolic            |
+| A11y tree capture        | capture          | macOS, Windows                 |
+| Demo storage             | capture          | JSON/Parquet/PNG               |
+| Privacy scrubbing        | privacy          | Presidio, AWS Comprehend       |
+| Demo embedding           | retrieval        | CLIP, SigLIP                   |
+| Vector indexing          | retrieval        | FAISS, Annoy                   |
+| Similarity search        | retrieval        | Top-k retrieval                |
+| API model adapters       | ml               | Claude, GPT-4V, Gemini         |
+| Element detection        | grounding        | OmniParser, Florence2          |
+| SoM annotation           | grounding        | Numbered element labels        |
+| WAA benchmark            | evals            | Full integration               |
+| Mock benchmark           | evals            | Testing infrastructure         |
+| HTML visualization       | viewer           | Trajectory replay              |
+| Unified CLI              | openadapt        | capture/train/eval/view        |
++==============================================================================+
+
++==============================================================================+
+|                    IN PROGRESS (Dashed)                                       |
++==============================================================================+
+| Component                | Package          | Notes                          |
++--------------------------+------------------+--------------------------------+
+| Training pipeline        | ml               | Qwen-VL fine-tuning            |
+| LoRA adapters            | ml               | Parameter-efficient training   |
+| Template extraction      | capture          | Regex-based parameterization   |
+| WebArena benchmark       | evals            | Browser automation             |
+| Training dashboard       | viewer           | Loss/metrics visualization     |
+| Audio transcription      | capture          | Whisper integration            |
++--------------------------+------------------+--------------------------------+
+
++==============================================================================+
+|                    FUTURE WORK (Dotted)                                       |
++==============================================================================+
+| Component                | Package          | Notes                          |
++--------------------------+------------------+--------------------------------+
+| Process mining           | ml (future)      | Semantic action discovery      |
+| Goal composition         | ml (future)      | High-level task planning       |
+| Self-improvement         | ml (future)      | Training on execution traces   |
+| OSWorld benchmark        | evals            | Cross-platform desktop         |
+| Multi-agent collaboration| ml (future)      | Agent coordination             |
+| Active learning          | ml (future)      | Human feedback integration     |
+| Mobile platform          | capture          | iOS, Android                   |
+| Safety gates             | evals            | Action validation layer        |
++==============================================================================+
 ```
 
-### Key Architectural Insights
-
-#### 1. Demo-Conditioned Prompting (Core Innovation)
-
-The diagram shows how **retrieval** feeds into BOTH:
-- **Training path**: Similar demos condition the fine-tuning process
-- **Evaluation path**: Retrieved demos provide in-context examples for API agents
-
-This "show, don't tell" approach improves first-action accuracy from 33% to 100%.
-
-#### 2. Policy/Grounding Separation
+### Abstraction Ladder Implementation Status
 
-The EXECUTE phase clearly separates:
-- **Policy** (PLAN): Decides *what* action to take (uses VLM reasoning)
-- **Grounding**: Determines *where* to execute (UI element localization via SoM, OmniParser, etc.)
+| Level | Name | Status | Implementation |
+|-------|------|--------|----------------|
+| 0 | Literal | **Implemented** | Raw event recording in `capture` |
+| 1 | Symbolic | **Implemented** | Event aggregation in `capture` |
+| 2 | Template | **Partial** | Regex extraction in `capture` |
+| 3 | Semantic | **Research** | LLM intent recognition |
+| 4 | Goal | **Future** | Process mining |
 
-#### 3. Safety Gate as Runtime Layer
+---
 
-Before action execution, the Safety Gate provides:
-- Action validation (sanity checks)
-- Risk assessment (destructive action detection)
-- Human confirmation (future: for high-risk actions)
+## 9. Architecture Evolution Diagrams
 
-#### 4. The Abstraction Ladder
+### Era 1: Alpha Monolith (2023)
 
-Progressive generalization from raw events to goals:
-- **Implemented**: Literal -> Symbolic -> Template
-- **Future**: Semantic -> Goal (requires process mining)
+```
++=========================================================================+
+|                        ALPHA ARCHITECTURE (2023)                         |
++=========================================================================+
+|                                                                          |
+|   +------------------------------------------------------------------+  |
+|   |                       openadapt (monolithic)                      |  |
+|   +------------------------------------------------------------------+  |
+|   |                                                                   |  |
+|   |   +-------------+    +-------------+    +-------------+          |  |
+|   |   |   record    | -> |  visualize  | -> |   replay    |          |  |
+|   |   +-------------+    +-------------+    +-------------+          |  |
+|   |         |                  |                  |                  |  |
+|   |         v                  v                  v                  |  |
+|   |   +-------------+    +-------------+    +------------------+     |  |
+|   |   |   models    |    |  plotting   |    |   strategies/    |     |  |
+|   |   | - Recording |    | - HTML gen  |    |   - base.py      |     |  |
+|   |   | - ActionEvt |    |             |    |   - naive.py     |     |  |
+|   |   | - Screenshot|    |             |    |   - vanilla.py   |     |  |
+|   |   | - WindowEvt |    |             |    |   - visual.py    |     |  |
+|   |   +-------------+    +-------------+    +------------------+     |  |
+|   |         |                                       |                |  |
+|   |         v                                       v                |  |
+|   |   +-------------+                       +---------------+        |  |
+|   |   |    db/      |                       |   adapters/   |        |  |
+|   |   | - SQLite    |                       | - anthropic   |        |  |
+|   |   | - CRUD ops  |                       | - openai      |        |  |
+|   |   +-------------+                       | - replicate   |        |  |
+|   |                                         +---------------+        |  |
+|   +------------------------------------------------------------------+  |
+|                                                                          |
++=========================================================================+
+
+Characteristics:
+- Single repository, single package
+- Tightly coupled components
+- Strategy pattern for replay variants
+- SQLite + Alembic migrations
+- Prompts embedded in code
+```
 
-#### 5. Evaluation-Driven Feedback Loops
+### Era 2: Transition (2024)
 
-Three feedback mechanisms:
-1. **Demo Library Growth**: Success traces become new training data
-2. **Self-Improvement**: Training signal from execution metrics (future)
-3. **Failure Analysis**: Human review of failed executions
+```
++=========================================================================+
+|                     TRANSITION ARCHITECTURE (2024)                       |
++=========================================================================+
+|                                                                          |
+|   Legacy codebase frozen -> /legacy/                                     |
+|                                                                          |
+|   New modular packages designed:                                         |
+|                                                                          |
+|   +-------------+  +-------------+  +-------------+  +-------------+    |
+|   |   capture   |  |     ml      |  |    evals    |  |   viewer    |    |
+|   +-------------+  +-------------+  +-------------+  +-------------+    |
+|   |   privacy   |  |  retrieval  |  |  grounding  |                     |
+|   +-------------+  +-------------+  +-------------+                     |
+|                                                                          |
+|   Key changes:                                                           |
+|   - Separate PyPI packages                                               |
+|   - Lazy imports for optional deps                                       |
+|   - Unified CLI in meta-package                                          |
+|   - Policy/grounding separation                                          |
+|   - Benchmark-first development                                          |
+|                                                                          |
++=========================================================================+
+```
 
----
+### Era 3: Modern Meta-Package (2025+)
 
-### Legacy Master Architecture Diagram
+```
++=========================================================================+
+|                    MODERN ARCHITECTURE (2025+)                           |
++=========================================================================+
+|                                                                          |
+|                         +------------------+                             |
+|                         |    User Layer    |                             |
+|                         |   CLI / Web UI   |                             |
+|                         +--------+---------+                             |
+|                                  |                                       |
+|                                  v                                       |
+|                         +------------------+                             |
+|                         |    openadapt     |                             |
+|                         |   (meta-package) |                             |
+|                         +--------+---------+                             |
+|                                  |                                       |
+|         +------------------------+------------------------+              |
+|         |            |           |           |            |              |
+|         v            v           v           v            v              |
+|   +---------+  +---------+  +---------+  +---------+  +--------+        |
+|   | capture |  |   ml    |  |  evals  |  | viewer  |  |optional|        |
+|   +---------+  +---------+  +---------+  +---------+  +--------+        |
+|       |            |            |            |            |              |
+|       v            v            v            v            v              |
+|   +---------------------------------------------------------------+     |
+|   |                      Shared Interfaces                         |     |
+|   |  - Trajectory format (JSON/Parquet)                           |     |
+|   |  - Action space specification                                  |     |
+|   |  - Observation schema                                          |     |
+|   |  - Benchmark protocols                                         |     |
+|   +---------------------------------------------------------------+     |
+|                                  |                                       |
+|                                  v                                       |
+|   +---------------------------------------------------------------+     |
+|   |                      Model Layer                               |     |
+|   |  +----------+  +----------+  +----------+  +----------+       |     |
+|   |  |  Claude  |  |  GPT-4V  |  |  Gemini  |  | Qwen-VL  |       |     |
+|   |  +----------+  +----------+  +----------+  +----------+       |     |
+|   +---------------------------------------------------------------+     |
+|                                                                          |
++=========================================================================+
+```
 
-For reference, the previous architecture diagram:
+### Full System Architecture (Mermaid)
 
 ```mermaid
 flowchart TB
@@ -824,8 +815,8 @@ flowchart TB
 
     subgraph Phase1["DEMONSTRATE"]
         direction TB
-        REC[Recorder]
-        SCRUB[Privacy Scrubber]
+        REC[Recorder<br/>openadapt-capture]
+        SCRUB[Privacy Scrubber<br/>openadapt-privacy]
         STORE[(Demo Library)]
 
         REC --> SCRUB
@@ -855,11 +846,11 @@ flowchart TB
 
     subgraph Phase3["EXECUTE"]
         direction TB
-        OBS[Observer]
-        GROUND[Grounder]
-        PLAN[Planner]
-        ACT[Actuator]
-        EVAL[Evaluator]
+        OBS[1. OBSERVE]
+        GROUND[2. GROUND<br/>openadapt-grounding]
+        PLAN[3. PLAN<br/>Demo-Conditioned]
+        ACT[4. ACT]
+        EVAL[5. EVALUATE<br/>openadapt-evals]
 
         OBS --> GROUND
         GROUND --> PLAN
@@ -874,7 +865,6 @@ flowchart TB
         GPT[GPT-4o]
         GEMINI[Gemini]
         QWEN[Qwen-VL]
-        CUSTOM[Custom]
     end
 
     subgraph Viewer["Cross-Cutting: Viewer"]
@@ -900,8 +890,8 @@ flowchart TB
     TRAINER --> CKPT
     ABSTRACT --> PATTERNS
 
-    %% Execution flow
-    SEARCH -.->|context| PLAN
+    %% Execution flow (demo-conditioning)
+    SEARCH -->|demo context| PLAN
     CKPT -->|policy| PLAN
     PATTERNS -.->|templates| PLAN
 
@@ -932,10 +922,51 @@ flowchart TB
     class EMB,IDX,SEARCH,LOADER,TRAINER,CKPT phase2
     class ABSTRACT,PATTERNS future
     class OBS,GROUND,PLAN,ACT,EVAL phase3
-    class CLAUDE,GPT,GEMINI,QWEN,CUSTOM models
+    class CLAUDE,GPT,GEMINI,QWEN models
     class VIZ,REPLAY,DASH viewer
 ```
 
+### Execution Loop Evolution
+
+```
+ALPHA: Strategy-Based                    MODERN: Policy/Grounding
+================================         ================================
+
++------------------+                     +------------------+
+|  BaseReplay      |                     |     OBSERVE      |
+|  Strategy        |                     |  (Screenshot +   |
+|                  |                     |   A11y tree)     |
+|  while True:     |                     +--------+---------+
+|    screenshot =  |                              |
+|      take()      |                              v
+|    action =      |                     +------------------+
+|      get_next()  |      ------>        |     GROUND       |
+|    play(action)  |                     |  (Element detect |
+|                  |                     |   + SoM annotate)|
++------------------+                     +--------+---------+
+                                                  |
+                                                  v
+                                         +------------------+
+                                         |      PLAN        |
+                                         |  (VLM reasoning  |
+                                         |   + demo context)|
+                                         +--------+---------+
+                                                  |
+                                                  v
+                                         +------------------+
+                                         |       ACT        |
+                                         |  (Input synth +  |
+                                         |   safety check)  |
+                                         +--------+---------+
+                                                  |
+                                                  v
+                                         +------------------+
+                                         |    EVALUATE      |
+                                         |  (Success check  |
+                                         |   + feedback)    |
+                                         +------------------+
+```
+
 ### Package Responsibility Diagram
 
 ```mermaid
@@ -1001,57 +1032,6 @@ flowchart LR
     class OA meta
 ```
 
-### Execution Loop Diagram
-
-```mermaid
-stateDiagram-v2
-    [*] --> Observe
-
-    Observe --> Ground: screenshot + a11y
-    Ground --> Plan: elements + coordinates
-    Plan --> Act: action prediction
-    Act --> Evaluate: action result
-
-    Evaluate --> Observe: continue
-    Evaluate --> Success: task complete
-    Evaluate --> Retry: recoverable error
-    Evaluate --> Escalate: unrecoverable
-
-    Retry --> Observe
-    Escalate --> [*]
-    Success --> [*]
-
-    note right of Observe
-        Capture screenshot
-        Extract a11y tree
-        Build observation
-    end note
-
-    note right of Ground
-        Detect UI elements
-        Apply SoM labels
-        Get coordinates
-    end note
-
-    note right of Plan
-        Encode with VLM
-        Retrieve similar demos
-        Generate action
-    end note
-
-    note right of Act
-        Parse action type
-        Execute input
-        Record for history
-    end note
-
-    note right of Evaluate
-        Check success
-        Detect failures
-        Decide next step
-    end note
-```
-
 ### Feedback Loop Diagram
 
 ```mermaid
@@ -1094,151 +1074,42 @@ flowchart TB
 
 ---
 
-## 9. Key Design Principles
-
-### Principle 1: Model Agnostic
-
-OpenAdapt works with any VLM that can process images and generate text.
-
-**Implementation**:
-- Adapter pattern for model integration
-- Unified prompt format across providers
-- Switchable at runtime via configuration
-
-**Rationale**:
-- Avoid vendor lock-in
-- Enable cost optimization
-- Future-proof against model evolution
-
-### Principle 2: Demo-Conditioned
-
-Agents learn from human examples, not just prompts.
-
-**Implementation**:
-- Retrieval-augmented prompting
-- Fine-tuning on demonstration datasets
-- Context windows include similar past examples
-
-**Rationale**:
-- Captures implicit knowledge
-- Reduces ambiguity
-- Enables transfer learning
-
-### Principle 3: Abstraction-Aware
-
-Progress from literal replay to semantic understanding.
-
-**Implementation**:
-- Abstraction ladder (literal -> symbolic -> template -> semantic -> goal)
-- Incremental abstraction during processing
-- Human-readable intermediate representations
-
-**Rationale**:
-- Enables generalization
-- Supports explanation and debugging
-- Allows cross-application transfer
-
-### Principle 4: Evaluation-Driven
-
-Rigorous benchmarking on standard tasks.
-
-**Implementation**:
-- WAA, WebArena, OSWorld benchmark integrations
-- Automated regression detection
-- Public leaderboard metrics
-
-**Rationale**:
-- Objective progress measurement
-- Community comparability
-- Quality assurance
-
-### Principle 5: Privacy-First
-
-Optional PII/PHI scrubbing at every stage.
-
-**Implementation**:
-- `openadapt-privacy` package
-- Configurable scrubbing levels
-- Local-only deployment option
-
-**Rationale**:
-- Enterprise compliance (HIPAA, GDPR)
-- User trust
-- Responsible AI
-
-### Principle 6: Open Source
-
-MIT license, community-driven development.
-
-**Implementation**:
-- All packages on GitHub
-- Public roadmap and issues
-- Contribution guidelines
-
-**Rationale**:
-- Transparency
-- Community innovation
-- No vendor lock-in
-
----
-
-## 10. Research Alignment
-
-OpenAdapt's architecture aligns with and builds upon recent GUI agent research.
-
-### Key Research Papers
-
-| Paper | Contribution | OpenAdapt Integration |
-|-------|--------------|----------------------|
-| Claude Computer Use (Anthropic, 2024) | Production VLM agent API | API adapter in `openadapt-ml` |
-| UFO (Microsoft, 2024) | Windows agent architecture | Prompt patterns adopted |
-| OSWorld (CMU, 2024) | Cross-platform benchmark | Benchmark adapter planned |
-| Set-of-Mark (Microsoft, 2023) | Visual grounding via labels | Core grounding mode |
-| OmniParser (Microsoft, 2024) | Pure-vision UI parsing | Provider in `openadapt-grounding` |
-| WebArena (CMU, 2023) | Web automation benchmark | Benchmark adapter implemented |
-| Mind2Web (OSU, 2023) | Web action prediction | Dataset format compatible |
-
-### Research Contributions
-
-OpenAdapt contributes to the research community through:
-
-1. **Open Benchmark Infrastructure**: Standardized evaluation setup
-2. **Demonstration Dataset Format**: Interoperable trajectory format
-3. **Retrieval-Augmented Agents**: Demo conditioning research
-4. **Grounding Comparison**: Multi-provider benchmarks
-5. **Abstraction Research**: Process mining for GUI agents
-
----
-
-## 11. Future Directions
+## 10. Future Directions
 
 ### Near-Term (Q1 2026)
 
-- Complete fine-tuning pipeline validation
-- Achieve competitive WAA benchmark scores
-- Launch docs.openadapt.ai
-- Release v1.0.0 meta-package
+| Priority | Goal | Package | Status |
+|----------|------|---------|--------|
+| P0 | PyPI releases for all packages | all | In progress |
+| P0 | WAA baseline metrics established | evals | Pending |
+| P1 | Fine-tuning pipeline validated | ml | In progress |
+| P1 | Demo conditioning in evals | evals + retrieval | Pending |
+| P2 | docs.openadapt.ai launched | docs | Pending |
 
 ### Medium-Term (2026)
 
-- Process mining implementation
-- Self-improvement loop activation
-- Multi-benchmark evaluation suite
-- Enterprise deployment guides
+| Goal | Description |
+|------|-------------|
+| **Process Mining** | Automatic extraction of semantic actions from demos |
+| **Self-Improvement** | Training on successful execution traces |
+| **Multi-Benchmark** | WebArena + OSWorld integration |
+| **Enterprise Deployment** | Production deployment guides |
 
 ### Long-Term (2026+)
 
-- Multi-agent collaboration
-- Active learning with human feedback
-- Mobile platform support
-- Cross-platform transfer learning
+| Goal | Description |
+|------|-------------|
+| **Cross-App Transfer** | Demos from Excel help with Google Sheets |
+| **Multi-Agent** | Coordinated agents for complex workflows |
+| **Active Learning** | Agents request human help strategically |
+| **Mobile Platforms** | iOS and Android capture/replay |
 
-### Research Agenda
+### Research Questions
 
-1. **Abstraction Hierarchy**: Can we automatically extract semantic actions from demonstrations?
-2. **Transfer Learning**: How do demos from one app help in another?
-3. **Active Learning**: When should the agent ask for human help?
-4. **Explanation**: How do we make agent decisions interpretable?
+1. **Abstraction Discovery**: Can we automatically extract semantic actions from literal event sequences?
+2. **Transfer Learning**: How much does demo conditioning help across different applications?
+3. **Explanation**: How do we make agent decisions interpretable to users?
+4. **Safety**: What guardrails prevent harmful autonomous actions?
 
 ---
 
@@ -1246,12 +1117,13 @@ OpenAdapt contributes to the research community through:
 
 | Term | Definition |
 |------|------------|
-| **A11y Tree** | Accessibility tree - structured representation of UI elements |
+| **A11y Tree** | Accessibility tree - structured UI element representation |
 | **Demo** | Recorded human demonstration (trajectory) |
-| **Grounding** | Mapping text descriptions to UI coordinates |
+| **Grounding** | Mapping text/intent to specific UI coordinates |
 | **LoRA** | Low-Rank Adaptation - efficient fine-tuning method |
-| **SoM** | Set-of-Mark - visual grounding via element labels |
-| **Trajectory** | Sequence of observations and actions |
+| **Policy** | Decision function mapping observations to actions |
+| **SoM** | Set-of-Mark - visual grounding via numbered labels |
+| **Trajectory** | Sequence of (observation, action) pairs |
 | **VLM** | Vision-Language Model |
 | **WAA** | Windows Agent Arena benchmark |
 
@@ -1259,14 +1131,15 @@ OpenAdapt contributes to the research community through:
 
 - [Architecture Overview](./architecture.md) - Package structure and data flow
 - [Roadmap Priorities](./roadmap-priorities.md) - Current development priorities
-- [Telemetry Design](./design/telemetry-design.md) - Telemetry implementation
-- [Landing Page Strategy](./design/landing-page-strategy.md) - Messaging and positioning
+- [Package Documentation](./packages/index.md) - Individual package guides
+- [Legacy Freeze](./legacy/freeze.md) - Migration from monolith
 
 ## Appendix C: Version History
 
 | Version | Date | Changes |
 |---------|------|---------|
-| 2.0 | Jan 2026 | Comprehensive redesign, SOTA alignment |
+| 3.0 | Jan 2026 | Alpha vision synthesis, evolution diagrams, SOTA alignment |
+| 2.0 | Jan 2026 | Comprehensive redesign, modular architecture |
 | 1.0 | Dec 2025 | Initial modular architecture |
 | 0.x | 2023-2024 | Legacy monolithic design |
 
diff --git a/docs/architecture.md b/docs/architecture.md
index 6604fbb5e..664eddcdc 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -79,26 +79,26 @@ flowchart TB
 
 ```mermaid
 flowchart LR
-    subgraph Record["1. Record"]
-        A[User Demo] --> B[Capture Session]
-        B --> C[Screenshots + Events]
+    subgraph Demonstrate["1. Demonstrate"]
+        A[Human Trajectory] --> B[Capture Session]
+        B --> C[Observations + Actions]
     end
 
     subgraph Store["2. Store"]
         C --> D[JSON/Parquet Files]
-        D --> E[Demo Library]
+        D --> E[Demonstration Library]
     end
 
-    subgraph Train["3. Train"]
-        E --> F[Data Loading]
-        F --> G[Model Training]
+    subgraph Learn["3. Learn"]
+        E --> F[Trajectory Abstraction]
+        F --> G[Policy Learning]
         G --> H[Checkpoint]
     end
 
-    subgraph Deploy["4. Deploy"]
-        H --> I[Agent Policy]
+    subgraph Execute["4. Execute"]
+        H --> I[Trained Policy]
         I --> J[Inference]
-        J --> K[Action Replay]
+        J --> K[Agent Deployment]
     end
 
     subgraph Evaluate["5. Evaluate"]
@@ -164,17 +164,17 @@ graph TD
 
 | Package | Responsibility | Key Exports |
 |---------|---------------|-------------|
-| **openadapt-capture** | GUI recording, event capture, storage | `CaptureSession`, `Recorder`, `Action` |
-| **openadapt-ml** | Model training, inference, adapters | `QwenVLAdapter`, `Trainer`, `AgentPolicy` |
+| **openadapt-capture** | Demonstration collection, observation-action capture, storage | `CaptureSession`, `Recorder`, `Action` |
+| **openadapt-ml** | Policy learning, training, inference | `QwenVLAdapter`, `Trainer`, `AgentPolicy` |
 | **openadapt-evals** | Benchmark evaluation, metrics | `ApiAgent`, `BenchmarkAdapter`, `evaluate_agent_on_benchmark` |
-| **openadapt-viewer** | HTML visualization, replay viewer | `PageBuilder`, `HTMLBuilder` |
+| **openadapt-viewer** | Trajectory visualization | `PageBuilder`, `HTMLBuilder` |
 
 ### Optional Packages
 
 | Package | Responsibility | Use Case |
 |---------|---------------|----------|
-| **openadapt-grounding** | UI element localization | Improved click accuracy with element detection |
-| **openadapt-retrieval** | Multimodal demo search | Find similar demonstrations for few-shot prompting |
+| **openadapt-grounding** | UI element grounding | Improved action accuracy with element detection |
+| **openadapt-retrieval** | Multimodal trajectory search | Find similar demonstrations for few-shot policy learning |
 | **openadapt-privacy** | PII/PHI scrubbing | Redact sensitive data before storage/training |
 
 ## Evaluation Loop
@@ -275,14 +275,14 @@ graph LR
 pip install openadapt
 
 # Individual packages
-pip install openadapt[capture]     # GUI capture/recording
-pip install openadapt[ml]          # ML training and inference
+pip install openadapt[capture]     # Demonstration collection
+pip install openadapt[ml]          # Policy learning and inference
 pip install openadapt[evals]       # Benchmark evaluation
-pip install openadapt[viewer]      # HTML visualization
+pip install openadapt[viewer]      # Trajectory visualization
 
 # Optional packages
-pip install openadapt[grounding]   # UI element localization
-pip install openadapt[retrieval]   # Demo search/retrieval
+pip install openadapt[grounding]   # UI element grounding
+pip install openadapt[retrieval]   # Trajectory retrieval
 pip install openadapt[privacy]     # PII/PHI scrubbing
 
 # Bundles
diff --git a/docs/assets/architecture-diagram.png b/docs/assets/architecture-diagram.png
index 1232c988d..2a8ebf1d3 100644
Binary files a/docs/assets/architecture-diagram.png and b/docs/assets/architecture-diagram.png differ
diff --git a/docs/cli.md b/docs/cli.md
index 996cbf21b..02ac8bdc4 100644
--- a/docs/cli.md
+++ b/docs/cli.md
@@ -42,11 +42,11 @@ This verifies:
 
 ## Capture Commands
 
-Commands for recording user demonstrations.
+Commands for collecting human demonstrations.
 
 ### capture start
 
-Start a new recording session.
+Start a new demonstration collection session.
 
 ```bash
 openadapt capture start --name <name> [options]
@@ -64,25 +64,25 @@ openadapt capture start --name <name> [options]
 **Examples:**
 
 ```bash
-# Basic recording
+# Basic demonstration collection
 openadapt capture start --name login-task
 
-# Recording without screenshots
+# Demonstration collection without screenshots
 openadapt capture start --name audio-task --no-screenshots
 
-# Recording with slower screenshot interval
+# Demonstration collection with slower screenshot interval
 openadapt capture start --name slow-task --interval 1.0
 ```
 
 ### capture stop
 
-Stop the current recording.
+Stop the current demonstration collection.
 
 ```bash
 openadapt capture stop
 ```
 
-Alternatively, press `Ctrl+C` in the recording terminal.
+Alternatively, press `Ctrl+C` in the capture terminal.
 
 ### capture list
 
@@ -103,7 +103,7 @@ form-fill      89       5m 42s     2026-01-14
 
 ### capture view
 
-Open the viewer for a capture.
+Open the trajectory viewer for a demonstration.
 
 ```bash
 openadapt capture view <name> [options]
@@ -113,13 +113,13 @@ openadapt capture view <name> [options]
 
 | Argument | Required | Description |
 |----------|----------|-------------|
-| `<name>` | Yes | Name of the capture to view |
+| `<name>` | Yes | Name of the demonstration to view |
 | `--port` | No | Server port (default: 8080) |
 | `--no-browser` | No | Don't open browser automatically |
 
 ### capture delete
 
-Delete a capture.
+Delete a demonstration.
 
 ```bash
 openadapt capture delete <name>
@@ -129,11 +129,11 @@ openadapt capture delete <name>
 
 ## Train Commands
 
-Commands for training ML models.
+Commands for policy learning from demonstrations.
 
 ### train start
 
-Start training a model on a capture.
+Start policy learning from a demonstration.
 
 ```bash
 openadapt train start --capture <name> --model <model> [options]
@@ -143,7 +143,7 @@ openadapt train start --capture <name> --model <model> [options]
 
 | Argument | Required | Description |
 |----------|----------|-------------|
-| `--capture` | Yes | Name of the capture to train on |
+| `--capture` | Yes | Name of the demonstration to train on |
 | `--model` | Yes | Model architecture |
 | `--epochs` | No | Number of training epochs (default: 10) |
 | `--batch-size` | No | Batch size (default: 4) |
@@ -159,10 +159,10 @@ openadapt train start --capture <name> --model <model> [options]
 **Examples:**
 
 ```bash
-# Basic training
+# Basic policy learning
 openadapt train start --capture login-task --model qwen3vl-2b
 
-# Training with custom parameters
+# Policy learning with custom parameters
 openadapt train start \
     --capture login-task \
     --model qwen3vl-7b \
@@ -173,7 +173,7 @@ openadapt train start \
 
 ### train status
 
-Check training progress.
+Check policy learning progress.
 
 ```bash
 openadapt train status
@@ -191,7 +191,7 @@ ETA: 15 minutes
 
 ### train stop
 
-Stop the current training.
+Stop the current policy learning.
 
 ```bash
 openadapt train stop
diff --git a/docs/design/landing-page-strategy.md b/docs/design/landing-page-strategy.md
new file mode 100644
index 000000000..ca6e31556
--- /dev/null
+++ b/docs/design/landing-page-strategy.md
@@ -0,0 +1,712 @@
+# OpenAdapt.ai Landing Page Strategy
+
+**Document Version**: 1.0
+**Date**: January 2026
+**Author**: Generated with AI assistance
+**Status**: Proposal for Review
+
+---
+
+## Table of Contents
+
+1. [Current State Analysis](#1-current-state-analysis)
+2. [Target Audience Definitions](#2-target-audience-definitions)
+3. [Core Messaging Strategy](#3-core-messaging-strategy)
+4. [Competitive Positioning](#4-competitive-positioning)
+5. [Page Section Recommendations](#5-page-section-recommendations)
+6. [Copy Suggestions](#6-copy-suggestions)
+7. [Wireframe Concepts](#7-wireframe-concepts)
+8. [Social Proof Strategy](#8-social-proof-strategy)
+9. [Call-to-Action Strategy](#9-call-to-action-strategy)
+10. [Implementation Priorities](#10-implementation-priorities)
+
+---
+
+## 1. Current State Analysis
+
+### 1.1 What OpenAdapt IS Today
+
+OpenAdapt has evolved from a monolithic application (v0.46.0) to a **modular meta-package architecture** (v1.0+). This is a significant architectural maturation that should be reflected in messaging.
+
+**Core Value Proposition (Current Reality)**:
+- The **open** source software **adapt**er between Large Multimodal Models (LMMs) and desktop/web GUIs
+- Record demonstrations, train models, evaluate agents via unified CLI
+- Works with any VLM: Claude, GPT-4V, Gemini, Qwen, or custom fine-tuned models
+
+**Technical Differentiators (Verified)**:
+1. **Model Agnostic**: Not locked to one AI provider
+2. **Demo-Prompted, Not User-Prompted**: Learn from human demonstration, not complex prompt engineering
+3. **Universal GUI Support**: Native apps, web browsers, virtualized environments
+4. **Open Source (MIT License)**: Full transparency, no vendor lock-in
+
+**Key Innovation**:
+- **Trajectory-conditioned disambiguation of UI affordances** - validated experiment showing 33% -> 100% first-action accuracy with demo conditioning
+- **Set-of-Marks (SoM) mode**: 100% accuracy on synthetic benchmarks using element IDs instead of coordinates
+
+### 1.2 Current Landing Page Assessment
+
+**What's Working Well**:
+- Clean, professional design with dark theme
+- Video demo at hero section
+- GitHub star/fork buttons for social proof
+- Platform-specific installation instructions (auto-detects OS)
+- PyPI download statistics showing traction
+- Industry use cases grid (HR, Law, Insurance, etc.)
+- Email signup for updates
+
+**What's Missing or Unclear**:
+1. **No clear "what is this?"** - Visitors need to watch a video to understand
+2. **Tagline "AI for Desktops" is vague** - Doesn't differentiate from competitors
+3. **No comparison to alternatives** - Why choose OpenAdapt over Anthropic Computer Use?
+4. **No technical credibility indicators** - No benchmark scores, no research citations
+5. **Industry grid is generic** - Same features could apply to any automation tool
+6. **No developer/researcher angle** - Focuses only on end-user automation
+7. **Architecture transition is hidden** - v1.0+ modular design is a major selling point
+8. **No clear "Who is this for?"** - Tries to appeal to everyone, resonates with no one
+
+**Carousel Messages Analysis**:
+- "Show, don't tell." - Good but cryptic
+- "Perform, don't prompt." - Best differentiator, should be prominent
+- "Record, replay, and share." - Functional but not compelling
+
+### 1.3 Technical Accuracy Issues
+
+The current site doesn't reflect:
+- The modular package architecture (7 focused sub-packages)
+- The evaluation infrastructure (WAA, WebArena benchmarks)
+- The ML training capabilities (VLM fine-tuning, LoRA)
+- The retrieval-augmented prompting (demo library search)
+- The privacy scrubbing capabilities (PII/PHI redaction)
+
+---
+
+## 2. Target Audience Definitions
+
+### 2.1 Primary Audiences
+
+#### A. Developers Building Automation Agents
+
+**Profile**:
+- Building AI-powered tools that interact with GUIs
+- May be creating internal tools, startup products, or client solutions
+- Comfortable with Python, CLI tools, ML concepts
+- Want flexibility, not black-box solutions
+
+**Pain Points**:
+- API-only agents (Claude Computer Use) lack customization
+- Building from scratch is too slow
+- Need to run locally for privacy/security
+- Want to fine-tune models on specific workflows
+
+**What They Need to See**:
+- Clear architecture diagrams
+- Code examples (pip install, quick start)
+- Benchmark scores vs. alternatives
+- Extensibility points (adapters, plugins)
+
+**Key Message**: "The open source SDK for building GUI automation agents"
+
+#### B. Enterprise Process Automation Buyers
+
+**Profile**:
+- Looking to automate repetitive knowledge work
+- Concerned about security, privacy, compliance
+- Need to justify ROI and integrate with existing systems
+- Often have IT/security review requirements
+
+**Pain Points**:
+- Existing RPA is brittle and expensive to maintain
+- Cloud-only AI raises data privacy concerns
+- Need clear enterprise support options
+- Require compliance with industry regulations
+
+**What They Need to See**:
+- Privacy features (PII/PHI scrubbing)
+- On-premise deployment options
+- Enterprise support/contact information
+- Industry-specific use case studies
+- Security architecture information
+
+**Key Message**: "AI-first automation that runs where your data lives"
+
+#### C. ML Researchers Studying GUI Agents
+
+**Profile**:
+- Academic researchers or industry R&D teams
+- Working on VLM capabilities, agent architectures, benchmarks
+- Need reproducible baselines and evaluation infrastructure
+- Want to contribute to or build upon open research
+
+**Pain Points**:
+- Existing benchmarks are hard to set up
+- Need standardized evaluation metrics
+- Want to compare models fairly
+- Limited open-source alternatives to proprietary agent frameworks
+
+**What They Need to See**:
+- Benchmark integration (WAA, WebArena, OSWorld)
+- Published metrics and methodology
+- Research paper citations (if any)
+- Clear contribution pathways
+- Schema/data format documentation
+
+**Key Message**: "Open infrastructure for GUI agent research and benchmarking"
+
+#### D. ML Engineers Interested in VLM Fine-Tuning
+
+**Profile**:
+- Want to train custom models for specific GUI tasks
+- Familiar with training infrastructure (LoRA, PEFT, etc.)
+- Looking for training data and pipelines
+- Want efficient local or cloud training options
+
+**Pain Points**:
+- Collecting GUI interaction data is tedious
+- Setting up VLM training pipelines is complex
+- Need baselines to compare against
+- Cloud GPU costs add up quickly
+
+**What They Need to See**:
+- Training pipeline documentation
+- Supported models (Qwen3-VL, etc.)
+- Training results (before/after fine-tuning)
+- Cloud GPU integration (Lambda Labs, Azure)
+- Data format specifications
+
+**Key Message**: "Record demonstrations, train specialized GUI agents"
+
+### 2.2 Audience Prioritization
+
+For the landing page, prioritize in this order:
+1. **Developers** (highest volume, most likely to convert to users/contributors)
+2. **Enterprise buyers** (revenue potential, require dedicated section)
+3. **ML engineers** (overlaps with developers, training angle)
+4. **Researchers** (smaller audience, but important for credibility)
+
+---
+
+## 3. Core Messaging Strategy
+
+### 3.1 Primary Tagline Options
+
+**Option A (Recommended)**:
+> **"Teach AI to use any software."**
+
+Why: Simple, benefit-focused, implies the key differentiator (demonstration-based learning)
+
+**Option B**:
+> **"The open source adapter between AI and any GUI."**
+
+Why: Explains the technical position, highlights open source
+
+**Option C**:
+> **"Perform, don't prompt."**
+
+Why: Clever contrast to prompt engineering, memorable
+
+**Option D**:
+> **"Record. Train. Automate."**
+
+Why: Clear 3-step process, action-oriented
+
+### 3.2 Supporting Taglines (Subheadlines)
+
+- "Show AI how to do a task once. Let it handle the rest."
+- "From human demonstration to AI automation in minutes."
+- "Open source GUI automation with the AI model of your choice."
+- "Works with Claude, GPT-4V, Gemini, Qwen, or your own fine-tuned models."
+
+### 3.3 Key Differentiators to Emphasize
+
+1. **Demonstration-Based Learning**
+   - Not: "Use natural language to describe tasks"
+   - But: "Just do the task and OpenAdapt learns from watching"
+   - Proof: 33% -> 100% first-action accuracy with demo conditioning
+
+2. **Model Agnostic**
+   - Not: "Works with [specific AI]"
+   - But: "Your choice: Claude, GPT-4V, Gemini, Qwen, or custom models"
+   - Proof: Adapters for multiple VLM backends
+
+3. **Runs Anywhere**
+   - Not: "Cloud-powered automation"
+   - But: "Run locally, in the cloud, or hybrid"
+   - Proof: CLI-based, works offline
+
+4. **Open Source**
+   - Not: "Try our free tier"
+   - But: "MIT licensed, fully transparent, community-driven"
+   - Proof: GitHub, PyPI, active Discord
+
+### 3.4 Messaging Framework
+
+**For Developers**:
+> "Build GUI automation agents with a modular Python SDK. Record demonstrations, train models, evaluate on benchmarks. Works with any VLM."
+
+**For Enterprise**:
+> "AI-first process automation that learns from your team. Privacy-first architecture with PII/PHI scrubbing. Deploy where your data lives."
+
+**For Researchers**:
+> "Open infrastructure for GUI agent research. Standardized benchmarks, reproducible baselines, extensible architecture."
+
+**For ML Engineers**:
+> "Fine-tune VLMs on real GUI workflows. Record data, train with LoRA, evaluate accuracy. Local or cloud training."
+
+---
+
+## 4. Competitive Positioning
+
+### 4.1 Primary Competitors
+
+| Competitor | Strengths | Weaknesses | Our Advantage |
+|------------|-----------|------------|---------------|
+| **Anthropic Computer Use** | First-mover, Claude integration, simple API | Proprietary, cloud-only, no customization | Open source, model-agnostic, trainable |
+| **UI-TARS (ByteDance)** | Strong benchmark scores, research backing | Closed source, not productized | Open source, deployable, extensible |
+| **Traditional RPA (UiPath, etc.)** | Enterprise-proven, large ecosystems | Brittle selectors, no AI reasoning, expensive | AI-first, learns from demos, affordable |
+| **GPT-4V + Custom Code** | Powerful model, flexibility | Requires building everything, no structure | Ready-made SDK, training pipeline, benchmarks |
+
+### 4.2 Positioning Statement
+
+> "OpenAdapt is the **open source alternative** to proprietary GUI automation APIs. Unlike cloud-only solutions, OpenAdapt lets you **train custom models** on your workflows and **deploy anywhere**. Unlike traditional RPA, OpenAdapt uses **AI reasoning** and **learns from demonstrations** instead of brittle scripts."
+
+### 4.3 Comparison Talking Points
+
+**vs. Anthropic Computer Use**:
+- "Model-agnostic - not locked to one provider"
+- "Fine-tune on your specific workflows"
+- "Run locally for privacy-sensitive data"
+- "Open source with community contributions"
+
+**vs. Traditional RPA**:
+- "AI understands intent, not just element selectors"
+- "Adapts to UI changes without manual updates"
+- "Learn from demonstrations, not scripting"
+- "Fraction of the cost, faster to deploy"
+
+---
+
+## 5. Page Section Recommendations
+
+### 5.1 Proposed Page Structure
+
+1. **Hero Section** (Above the fold)
+2. **How It Works** (3-step process)
+3. **Key Differentiators** (3-4 value props)
+4. **For Developers** (SDK/CLI features)
+5. **For Enterprise** (Security, privacy, support)
+6. **Use Cases** (Specific, concrete examples)
+7. **Comparison** (Why OpenAdapt)
+8. **Social Proof** (Metrics, testimonials, logos)
+9. **Getting Started** (Install, docs, community)
+10. **Footer** (Links, legal, contact)
+
+### 5.2 Hero Section Redesign
+
+**Current**: "OpenAdapt.AI - AI for Desktops. Automate your workflows. No coding required."
+
+**Proposed**:
+
+```
+[Logo] OpenAdapt.AI
+
+# Teach AI to use any software.
+
+Show it once. Let it handle the rest.
+
+[Video Demo - Keep current]
+
+[Install in 30 seconds] [View on GitHub] [Join Discord]
+
+"Works with Claude, GPT-4V, Gemini, Qwen, or your own fine-tuned models"
+
+{GitHub Stars} {PyPI Downloads} {Discord Members}
+```
+
+### 5.3 How It Works Section
+
+**Current**: Carousel with "Show, don't tell" / "Perform, don't prompt" / "Record, replay, share"
+
+**Proposed**: Clear 3-step process with visuals
+
+```
+## How OpenAdapt Works
+
+1. RECORD
+   [Icon: Screen recording]
+   Demonstrate the task by doing it yourself.
+   OpenAdapt captures screenshots, mouse clicks, and keystrokes.
+
+2. TRAIN
+   [Icon: Neural network]
+   Train an AI model on your demonstration.
+   Fine-tune Qwen-VL, use Claude/GPT-4V, or bring your own model.
+
+3. DEPLOY
+   [Icon: Play button]
+   Run the trained agent to automate the task.
+   Evaluate with standardized benchmarks.
+```
+
+### 5.4 Differentiators Section
+
+```
+## Why OpenAdapt?
+
+### Demonstration-Based Learning
+No prompt engineering required. OpenAdapt learns from how you actually do tasks.
+[Stat: 33% -> 100% first-action accuracy with demo conditioning]
+
+### Model Agnostic
+Your choice of AI: Claude, GPT-4V, Gemini, Qwen-VL, or fine-tune your own.
+Not locked to any single provider.
+
+### Run Anywhere
+CLI-based, works offline. Deploy locally, in the cloud, or hybrid.
+Your data stays where you want it.
+
+### Fully Open Source
+MIT licensed. Transparent, auditable, community-driven.
+No vendor lock-in, ever.
+```
+
+### 5.5 For Developers Section
+
+```
+## Built for Developers
+
+### Modular Architecture
+Seven focused packages you can install individually:
+- openadapt-capture: Recording
+- openadapt-ml: Training & inference
+- openadapt-evals: Benchmarking
+- openadapt-viewer: Visualization
+- openadapt-grounding: UI element detection
+- openadapt-retrieval: Demo library search
+- openadapt-privacy: PII/PHI scrubbing
+
+### Quick Start
+```bash
+# Install
+pip install openadapt[all]
+
+# Record a demonstration
+openadapt capture start --name my-task
+
+# Train a model
+openadapt train start --capture my-task --model qwen3vl-2b
+
+# Evaluate
+openadapt eval run --checkpoint model.pt --benchmark waa
+```
+
+### Benchmark Ready
+Integrated with Windows Agent Arena (WAA), WebArena, and OSWorld.
+Compare your models against published baselines.
+
+[View Documentation] [GitHub Repository]
+```
+
+### 5.6 For Enterprise Section
+
+```
+## Enterprise-Ready Automation
+
+### Privacy First
+Built-in PII/PHI scrubbing with AWS Comprehend, Microsoft Presidio, or Private AI.
+Your sensitive data never leaves your infrastructure.
+
+### Deploy Your Way
+Run entirely on-premise, in your cloud, or hybrid.
+No data leaves your environment unless you want it to.
+
+### Compliance Ready
+Audit logging, reproducible recordings, explainable AI decisions.
+Built for regulated industries.
+
+### Enterprise Support
+Custom development, training, and support packages available.
+
+[Contact Sales: sales@openadapt.ai]
+```
+
+### 5.7 Use Cases Section (Refined)
+
+**Current**: Generic industry grid
+
+**Proposed**: Specific, concrete use cases with workflows
+
+```
+## Real-World Automation
+
+### Data Entry Across Systems
+Transfer information between applications that don't integrate.
+Example: Copy customer data from CRM to billing system.
+
+### Report Generation
+Compile data from multiple sources into standardized reports.
+Example: Monthly sales reports from Salesforce + Excel + internal tools.
+
+### Legacy System Integration
+Automate workflows in applications without APIs.
+Example: Mainframe data entry, proprietary healthcare systems.
+
+### Quality Assurance Testing
+Record manual test procedures, replay with validation.
+Example: Regression testing across UI updates.
+
+### Process Documentation
+Record workflows to create training materials automatically.
+Example: Onboarding guides for complex internal tools.
+```
+
+---
+
+## 6. Copy Suggestions
+
+### 6.1 Headlines
+
+| Section | Headline | Subheadline |
+|---------|----------|-------------|
+| Hero | "Teach AI to use any software." | "Show it once. Let it handle the rest." |
+| How It Works | "Three Steps to Automation" | "Record, train, deploy." |
+| Differentiators | "Why OpenAdapt?" | "Open source, model-agnostic, demonstration-based." |
+| Developers | "Built for Developers" | "A modular SDK for building GUI automation agents." |
+| Enterprise | "Enterprise-Ready" | "AI automation that runs where your data lives." |
+| Use Cases | "Automate Any Workflow" | "From data entry to testing to legacy integration." |
+| Install | "Get Started in 30 Seconds" | "One command installs everything you need." |
+
+### 6.2 CTAs (Calls to Action)
+
+| Context | Primary CTA | Secondary CTA |
+|---------|-------------|---------------|
+| Hero | "Get Started" | "View Demo" |
+| Developers | "View Documentation" | "Star on GitHub" |
+| Enterprise | "Contact Sales" | "Download Whitepaper" |
+| Footer | "Join Discord" | "View on GitHub" |
+
+### 6.3 Proof Points to Include
+
+- "33% -> 100% first-action accuracy with demonstration conditioning"
+- "[X,XXX] PyPI downloads this month" (dynamic)
+- "[XXX] GitHub stars" (dynamic)
+- "7 modular packages, 1 unified CLI"
+- "Integrated with Windows Agent Arena, WebArena, OSWorld benchmarks"
+- "MIT licensed, fully open source"
+
+---
+
+## 7. Wireframe Concepts
+
+### 7.1 Desktop Layout
+
+```
++------------------------------------------------------------------+
+|  [Logo]                    [Docs] [GitHub] [Discord] [Enterprise] |
++------------------------------------------------------------------+
+|                                                                    |
+|  # Teach AI to use any software.                                   |
+|  Show it once. Let it handle the rest.                            |
+|                                                                    |
+|  [==================== Video Demo ====================]           |
+|                                                                    |
+|  [Get Started]  [View on GitHub]                                  |
+|                                                                    |
+|  Works with: [Claude] [GPT-4V] [Gemini] [Qwen] [Custom]           |
+|                                                                    |
+|  [GitHub Stars]  [PyPI Downloads]  [Discord Members]              |
+|                                                                    |
++------------------------------------------------------------------+
+|                                                                    |
+|  ## How OpenAdapt Works                                           |
+|                                                                    |
+|  [1. RECORD]     [2. TRAIN]      [3. DEPLOY]                      |
+|  [Screenshot]    [Neural Net]    [Automation]                     |
+|  Demonstrate     Train on your   Run the agent                    |
+|  the task.       demonstration.  to automate.                     |
+|                                                                    |
++------------------------------------------------------------------+
+|                                                                    |
+|  ## Why OpenAdapt?                                                |
+|                                                                    |
+|  [Demo-Based]    [Model Agnostic]  [Run Anywhere]  [Open Source]  |
+|  Learn from      Your choice of    Local, cloud,   MIT licensed   |
+|  examples.       AI provider.      or hybrid.      forever.       |
+|                                                                    |
++------------------------------------------------------------------+
+|                                                                    |
+|  [For Developers Tab]  [For Enterprise Tab]  [For Researchers Tab]|
+|                                                                    |
+|  Content switches based on selected audience...                   |
+|                                                                    |
++------------------------------------------------------------------+
+|                                                                    |
+|  ## Get Started                                                   |
+|                                                                    |
+|  [macOS] [Windows] [Linux]                                        |
+|                                                                    |
+|  $ curl -LsSf https://astral.sh/uv/install.sh | sh               |
+|  $ uv tool install openadapt                                      |
+|  $ openadapt --help                                               |
+|                                                                    |
+|  [X,XXX installs this month]                                      |
+|                                                                    |
++------------------------------------------------------------------+
+|                                                                    |
+|  [Footer: Links, Social, Legal]                                   |
+|                                                                    |
++------------------------------------------------------------------+
+```
+
+### 7.2 Mobile Considerations
+
+- Stack hero elements vertically
+- Collapse model logos into scrollable row
+- Use accordion for audience tabs
+- Keep video demo prominent
+- Simplify code blocks (single command with copy button)
+
+---
+
+## 8. Social Proof Strategy
+
+### 8.1 Metrics to Display
+
+**Live Metrics** (fetch from APIs):
+- GitHub stars (currently showing, keep)
+- PyPI downloads per month (currently showing, keep)
+- Discord member count (add if available)
+- Number of GitHub contributors (add)
+
+**Static Metrics** (update manually):
+- "7 modular packages"
+- "100% synthetic benchmark accuracy (SoM mode)"
+- "3 benchmark integrations (WAA, WebArena, OSWorld)"
+
+### 8.2 Testimonials Strategy
+
+**Priority Order**:
+1. Named enterprise user quotes (if available)
+2. Named developer testimonials from Discord
+3. Anonymous industry testimonials
+4. Community member quotes
+
+**Template for Gathering**:
+> "How has OpenAdapt helped you? Reply to be featured on our website."
+
+### 8.3 Logo Wall
+
+**Target logos to seek permission for**:
+- Companies using OpenAdapt in production
+- Universities using for research
+- Partner organizations
+
+**Fallback** (if no logos available):
+- Featured in media logos (if covered)
+- Integration partner logos (AWS, Azure, etc.)
+- "Trusted by teams at Fortune 500 companies" (if true)
+
+---
+
+## 9. Call-to-Action Strategy
+
+### 9.1 Primary Conversion Goals
+
+1. **GitHub star** (low friction, high visibility)
+2. **PyPI install** (product usage)
+3. **Discord join** (community engagement)
+4. **Email signup** (for updates)
+5. **Enterprise contact** (revenue)
+
+### 9.2 CTA Placement
+
+| Location | Primary CTA | Secondary CTA |
+|----------|-------------|---------------|
+| Hero | "Get Started" -> Install section | "View on GitHub" |
+| After video | "Try it yourself" -> Install | "Join Discord" |
+| Developers section | "View Docs" | "Star on GitHub" |
+| Enterprise section | "Contact Sales" | "Request Demo" |
+| Bottom of page | "Join Discord" | "View Documentation" |
+| Sticky header (scroll) | "Get Started" | |
+
+### 9.3 Email Capture Strategy
+
+**Current**: "Register for updates"
+
+**Proposed**: More specific value prop
+- "Get early access to new features"
+- "Join [X,XXX] developers automating with AI"
+- "Subscribe to the OpenAdapt newsletter (monthly, no spam)"
+
+---
+
+## 10. Implementation Priorities
+
+### 10.1 Phase 1: Quick Wins (1-2 weeks)
+
+1. **Update hero tagline** to "Teach AI to use any software."
+2. **Add "How It Works" section** with 3-step process
+3. **Update differentiators** to 4-card grid (current features but better copy)
+4. **Add Discord member count** to social proof
+5. **Add GitHub contributors count**
+
+### 10.2 Phase 2: Messaging Clarity (2-4 weeks)
+
+1. **Add "For Developers" section** with code examples and architecture
+2. **Add "For Enterprise" section** with privacy/security messaging
+3. **Replace generic industry grid** with specific use case examples
+4. **Add comparison table** vs. alternatives
+5. **Update email signup copy** to be more specific
+
+### 10.3 Phase 3: Credibility Building (4-8 weeks)
+
+1. **Add benchmark scores** (once published)
+2. **Collect and display testimonials**
+3. **Create case studies** (1-2 real examples)
+4. **Add logo wall** (if logos available)
+5. **Add "Research" or "Publications" section** (if applicable)
+
+### 10.4 Phase 4: Conversion Optimization (Ongoing)
+
+1. **A/B test hero messaging**
+2. **Track install conversion rates**
+3. **Optimize CTA placement**
+4. **Add video transcripts/captions for SEO**
+5. **Create landing page variants** for different audiences (developer vs. enterprise)
+
+---
+
+## Appendix A: Messaging Don'ts
+
+- **Don't say "AI for Desktops"** - too vague, doesn't differentiate
+- **Don't say "No coding required"** - true for end users, but alienates developers
+- **Don't list every industry** - pick 3-4 with real stories
+- **Don't hide the CLI** - developers want to see it
+- **Don't over-promise** - be honest about current capabilities
+
+## Appendix B: Technical Content to Add
+
+1. **Architecture diagram** showing package relationships
+2. **Mermaid flowchart** of Record -> Train -> Deploy cycle
+3. **Comparison table** of model backends (Claude, GPT, Qwen, etc.)
+4. **Benchmark table** showing accuracy scores
+5. **API reference link** to documentation site
+
+## Appendix C: SEO Keywords
+
+Primary:
+- "GUI automation AI"
+- "desktop automation AI"
+- "RPA alternative AI"
+- "VLM GUI agent"
+- "open source computer use"
+
+Secondary:
+- "train AI on screenshots"
+- "demonstration-based automation"
+- "model-agnostic automation"
+- "Claude computer use alternative"
+- "AI workflow automation"
+
+---
+
+*This document is a living strategy guide. Updates should be made as OpenAdapt capabilities evolve and as user feedback is collected.*
diff --git a/docs/design/openadapt-tray.md b/docs/design/openadapt-tray.md
new file mode 100644
index 000000000..6e347814d
--- /dev/null
+++ b/docs/design/openadapt-tray.md
@@ -0,0 +1,1220 @@
+# openadapt-tray Package Design
+
+## Overview
+
+`openadapt-tray` is a cross-platform system tray application that provides a graphical interface for the OpenAdapt ecosystem. It serves as a thin orchestration layer, allowing users to control recording, monitor training, view captures, and access settings without using the command line.
+
+## Legacy Implementation Analysis
+
+### Current Features (Legacy `openadapt/app/tray.py`)
+
+The legacy implementation uses **PySide6/Qt** for cross-platform system tray functionality:
+
+**Architecture:**
+- `QSystemTrayIcon` for the system tray icon
+- `QMenu` for context menu
+- `QDialog` for configuration dialogs (replay strategy, delete confirmation)
+- `pyqttoast` for toast notifications
+- Multiprocessing pipes (`multiprocessing.Pipe`) for IPC with recording process
+- `QThread` + `Worker` pattern for async signal handling
+- Platform-specific Dock hiding on macOS via `AppKit`
+
+**Menu Structure:**
+- Record / Stop Recording (toggle)
+- Visualize submenu (lists all recordings)
+- Replay submenu (lists all recordings, opens strategy dialog)
+- Delete submenu (lists all recordings, confirms deletion)
+- Quit
+
+**Key Patterns:**
+- `TrackedQAction` - wraps `QAction` to send analytics events via PostHog
+- Signal-based state updates (`record.starting`, `record.started`, `record.stopping`, `record.stopped`, `replay.*`)
+- Toast notifications for status updates (recording started/stopped, etc.)
+- Dashboard launched automatically as a background thread
+- Recording process runs in a separate `multiprocessing.Process`
+
+**Stop Sequences:**
+- Typing `oa.stop` or pressing `Ctrl` three times stops recording
+- Configurable via `STOP_SEQUENCES` in config
+
+### Limitations of Legacy Implementation
+
+1. **Heavyweight dependency** - PySide6 is a large dependency (~100MB+)
+2. **No global hotkeys** - Recording can only be stopped via stop sequences or tray menu
+3. **Tightly coupled** - Direct imports of internal modules (crud, models, etc.)
+4. **No status icons** - Same icon regardless of state
+5. **No auto-start** - Manual setup required for login startup
+6. **Single dashboard** - Only supports the legacy Next.js dashboard
+
+## New Architecture Design
+
+### Design Principles
+
+1. **Thin wrapper** - Minimal business logic; delegate to CLI or sub-packages
+2. **Cross-platform first** - Consistent behavior on macOS, Windows, and Linux
+3. **Lightweight** - Prefer smaller dependencies (pystray ~50KB vs PySide6 ~100MB)
+4. **Event-driven** - Async status updates via IPC
+5. **Configurable** - User-customizable hotkeys, icons, and behaviors
+
+### Package Structure
+
+```
+openadapt-tray/
+├── src/openadapt_tray/
+│   ├── __init__.py           # Package exports, version
+│   ├── __main__.py           # Entry point: python -m openadapt_tray
+│   ├── app.py                # Main TrayApplication class
+│   ├── menu.py               # Menu construction and actions
+│   ├── icons.py              # Icon loading and status icons
+│   ├── notifications.py      # Cross-platform notifications
+│   ├── shortcuts.py          # Global hotkey handling
+│   ├── config.py             # Tray-specific configuration
+│   ├── ipc.py                # Inter-process communication
+│   ├── state.py              # Application state machine
+│   └── platform/
+│       ├── __init__.py       # Platform detection and abstraction
+│       ├── base.py           # Abstract base class
+│       ├── macos.py          # macOS-specific (AppKit, rumps optional)
+│       ├── windows.py        # Windows-specific (win32api)
+│       └── linux.py          # Linux-specific (AppIndicator)
+├── assets/
+│   ├── icons/
+│   │   ├── idle.png          # Default state
+│   │   ├── idle@2x.png       # Retina support
+│   │   ├── recording.png     # Recording active
+│   │   ├── recording@2x.png
+│   │   ├── training.png      # Training in progress
+│   │   ├── training@2x.png
+│   │   ├── error.png         # Error state
+│   │   └── error@2x.png
+│   └── logo.ico              # Windows icon format
+├── pyproject.toml
+├── README.md
+└── tests/
+    ├── test_app.py
+    ├── test_menu.py
+    ├── test_shortcuts.py
+    └── test_platform.py
+```
+
+### Dependencies
+
+**Required:**
+```toml
+[project]
+dependencies = [
+    "pystray>=0.19.0",         # Cross-platform system tray
+    "Pillow>=9.0.0",           # Icon handling
+    "pynput>=1.7.0",           # Global hotkeys
+    "click>=8.0.0",            # CLI integration (consistent with meta-package)
+]
+```
+
+**Optional Platform Enhancements:**
+```toml
+[project.optional-dependencies]
+macos-native = [
+    "rumps>=0.4.0",            # Native macOS menu bar
+]
+all = [
+    "openadapt-tray[macos-native]",
+]
+```
+
+**Why pystray over PySide6/Qt:**
+- Dramatically smaller (~50KB vs ~100MB)
+- Pure Python, easier to install
+- Sufficient for system tray use case
+- Works well with pynput for hotkeys
+
+### Core Components
+
+#### 1. State Machine (`state.py`)
+
+```python
+from enum import Enum, auto
+from dataclasses import dataclass
+from typing import Optional, Callable
+
+class TrayState(Enum):
+    """Application states."""
+    IDLE = auto()
+    RECORDING_STARTING = auto()
+    RECORDING = auto()
+    RECORDING_STOPPING = auto()
+    TRAINING = auto()
+    TRAINING_PAUSED = auto()
+    ERROR = auto()
+
+@dataclass
+class AppState:
+    """Current application state."""
+    state: TrayState = TrayState.IDLE
+    current_capture: Optional[str] = None
+    training_progress: Optional[float] = None
+    error_message: Optional[str] = None
+
+    def can_start_recording(self) -> bool:
+        return self.state == TrayState.IDLE
+
+    def can_stop_recording(self) -> bool:
+        return self.state == TrayState.RECORDING
+
+class StateManager:
+    """Manages application state transitions."""
+
+    def __init__(self):
+        self._state = AppState()
+        self._listeners: list[Callable[[AppState], None]] = []
+
+    def add_listener(self, callback: Callable[[AppState], None]):
+        self._listeners.append(callback)
+
+    def transition(self, new_state: TrayState, **kwargs):
+        """Transition to a new state and notify listeners."""
+        self._state = AppState(state=new_state, **kwargs)
+        for listener in self._listeners:
+            listener(self._state)
+
+    @property
+    def current(self) -> AppState:
+        return self._state
+```
+
+#### 2. Main Application (`app.py`)
+
+```python
+import sys
+import threading
+from typing import Optional
+
+import pystray
+from PIL import Image
+
+from openadapt_tray.state import StateManager, TrayState
+from openadapt_tray.menu import MenuBuilder
+from openadapt_tray.icons import IconManager
+from openadapt_tray.shortcuts import HotkeyManager
+from openadapt_tray.notifications import NotificationManager
+from openadapt_tray.ipc import IPCClient
+from openadapt_tray.config import TrayConfig
+from openadapt_tray.platform import get_platform_handler
+
+class TrayApplication:
+    """Main system tray application."""
+
+    def __init__(self, config: Optional[TrayConfig] = None):
+        self.config = config or TrayConfig.load()
+        self.state = StateManager()
+        self.platform = get_platform_handler()
+
+        # Initialize components
+        self.icons = IconManager()
+        self.notifications = NotificationManager()
+        self.menu_builder = MenuBuilder(self)
+        self.hotkeys = HotkeyManager(self.config.hotkeys)
+        self.ipc = IPCClient()
+
+        # Create tray icon
+        self.icon = pystray.Icon(
+            name="openadapt",
+            icon=self.icons.get(TrayState.IDLE),
+            title="OpenAdapt",
+            menu=self.menu_builder.build(),
+        )
+
+        # Register state change handler
+        self.state.add_listener(self._on_state_change)
+
+        # Register hotkey handlers
+        self._setup_hotkeys()
+
+    def _setup_hotkeys(self):
+        """Configure global hotkeys."""
+        self.hotkeys.register(
+            self.config.hotkeys.toggle_recording,
+            self._toggle_recording
+        )
+        self.hotkeys.register(
+            self.config.hotkeys.open_dashboard,
+            self._open_dashboard
+        )
+
+    def _on_state_change(self, state):
+        """Handle state changes."""
+        # Update icon
+        self.icon.icon = self.icons.get(state.state)
+
+        # Update menu
+        self.icon.menu = self.menu_builder.build()
+
+        # Show notification if appropriate
+        self._show_state_notification(state)
+
+    def _show_state_notification(self, state):
+        """Show notification for state transitions."""
+        messages = {
+            TrayState.RECORDING: ("Recording Started", f"Capturing: {state.current_capture}"),
+            TrayState.IDLE: ("Recording Stopped", "Capture saved"),
+            TrayState.TRAINING: ("Training Started", "Model training in progress"),
+            TrayState.ERROR: ("Error", state.error_message or "An error occurred"),
+        }
+        if state.state in messages:
+            title, body = messages[state.state]
+            self.notifications.show(title, body)
+
+    def _toggle_recording(self):
+        """Toggle recording state."""
+        if self.state.current.can_start_recording():
+            self.start_recording()
+        elif self.state.current.can_stop_recording():
+            self.stop_recording()
+
+    def start_recording(self, name: Optional[str] = None):
+        """Start a new capture session."""
+        if not self.state.current.can_start_recording():
+            return
+
+        # Prompt for name if not provided (platform-specific)
+        if name is None:
+            name = self.platform.prompt_input(
+                "New Recording",
+                "Enter a name for this capture:"
+            )
+            if not name:
+                return
+
+        self.state.transition(TrayState.RECORDING_STARTING, current_capture=name)
+
+        # Start capture via CLI subprocess or direct API
+        threading.Thread(
+            target=self._run_capture,
+            args=(name,),
+            daemon=True
+        ).start()
+
+    def _run_capture(self, name: str):
+        """Run capture in background thread."""
+        try:
+            # Option 1: Via subprocess (preferred for isolation)
+            import subprocess
+            self.capture_process = subprocess.Popen(
+                ["openadapt", "capture", "start", "--name", name],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+            )
+            self.state.transition(TrayState.RECORDING, current_capture=name)
+
+        except Exception as e:
+            self.state.transition(TrayState.ERROR, error_message=str(e))
+
+    def stop_recording(self):
+        """Stop the current capture session."""
+        if not self.state.current.can_stop_recording():
+            return
+
+        self.state.transition(TrayState.RECORDING_STOPPING)
+
+        # Send stop signal to capture process
+        if hasattr(self, 'capture_process') and self.capture_process:
+            self.capture_process.terminate()
+
+        self.state.transition(TrayState.IDLE)
+
+    def _open_dashboard(self):
+        """Open the web dashboard."""
+        import webbrowser
+        webbrowser.open(f"http://localhost:{self.config.dashboard_port}")
+
+    def run(self):
+        """Run the application."""
+        # Start hotkey listener
+        self.hotkeys.start()
+
+        # Platform-specific setup
+        self.platform.setup()
+
+        # Run the tray icon (blocks)
+        self.icon.run()
+
+    def quit(self):
+        """Quit the application."""
+        self.hotkeys.stop()
+        self.ipc.close()
+        self.icon.stop()
+
+def main():
+    """Entry point."""
+    app = TrayApplication()
+    try:
+        app.run()
+    except KeyboardInterrupt:
+        app.quit()
+
+if __name__ == "__main__":
+    main()
+```
+
+#### 3. Menu Builder (`menu.py`)
+
+```python
+from typing import TYPE_CHECKING, Callable, Optional
+from functools import partial
+
+import pystray
+from pystray import MenuItem as Item, Menu
+
+if TYPE_CHECKING:
+    from openadapt_tray.app import TrayApplication
+
+from openadapt_tray.state import TrayState
+
+class MenuBuilder:
+    """Builds the system tray context menu."""
+
+    def __init__(self, app: "TrayApplication"):
+        self.app = app
+
+    def build(self) -> Menu:
+        """Build the current menu based on application state."""
+        state = self.app.state.current
+
+        items = [
+            self._build_recording_item(state),
+            Menu.SEPARATOR,
+            self._build_captures_submenu(),
+            self._build_training_item(state),
+            Menu.SEPARATOR,
+            Item("Open Dashboard", self._open_dashboard),
+            Item("Settings...", self._open_settings),
+            Menu.SEPARATOR,
+            Item("Quit", self._quit),
+        ]
+
+        return Menu(*items)
+
+    def _build_recording_item(self, state) -> Item:
+        """Build record/stop recording menu item."""
+        if state.state == TrayState.RECORDING:
+            return Item(
+                f"Stop Recording ({state.current_capture})",
+                self.app.stop_recording,
+            )
+        elif state.state in (TrayState.RECORDING_STARTING, TrayState.RECORDING_STOPPING):
+            return Item(
+                "Recording..." if state.state == TrayState.RECORDING_STARTING else "Stopping...",
+                None,
+                enabled=False,
+            )
+        else:
+            return Item(
+                f"Start Recording ({self.app.config.hotkeys.toggle_recording})",
+                self.app.start_recording,
+                enabled=state.can_start_recording(),
+            )
+
+    def _build_captures_submenu(self) -> Item:
+        """Build captures submenu."""
+        captures = self._get_recent_captures()
+
+        if not captures:
+            return Item(
+                "Recent Captures",
+                Menu(Item("No captures", None, enabled=False)),
+            )
+
+        capture_items = [
+            Item(
+                f"{c.name} ({c.timestamp})",
+                Menu(
+                    Item("View", partial(self._view_capture, c.path)),
+                    Item("Delete", partial(self._delete_capture, c.path)),
+                ),
+            )
+            for c in captures[:10]  # Limit to 10 most recent
+        ]
+
+        capture_items.append(Menu.SEPARATOR)
+        capture_items.append(Item("View All...", self._open_captures_list))
+
+        return Item("Recent Captures", Menu(*capture_items))
+
+    def _build_training_item(self, state) -> Item:
+        """Build training status/control item."""
+        if state.state == TrayState.TRAINING:
+            progress = state.training_progress or 0
+            return Item(
+                f"Training: {progress:.0%}",
+                Menu(
+                    Item("View Progress", self._open_training_dashboard),
+                    Item("Stop Training", self._stop_training),
+                ),
+            )
+        else:
+            return Item(
+                "Training",
+                Menu(
+                    Item("Start Training...", self._start_training),
+                    Item("View Last Results", self._view_training_results),
+                ),
+            )
+
+    def _get_recent_captures(self):
+        """Get list of recent captures."""
+        try:
+            from pathlib import Path
+            from openadapt_tray.config import TrayConfig
+
+            captures_dir = Path(TrayConfig.load().captures_directory)
+            if not captures_dir.exists():
+                return []
+
+            # Simple capture detection - look for capture directories
+            captures = []
+            for d in sorted(captures_dir.iterdir(), key=lambda x: x.stat().st_mtime, reverse=True):
+                if d.is_dir() and (d / "metadata.json").exists():
+                    from dataclasses import dataclass
+                    from datetime import datetime
+
+                    @dataclass
+                    class CaptureInfo:
+                        name: str
+                        path: str
+                        timestamp: str
+
+                    mtime = datetime.fromtimestamp(d.stat().st_mtime)
+                    captures.append(CaptureInfo(
+                        name=d.name,
+                        path=str(d),
+                        timestamp=mtime.strftime("%Y-%m-%d %H:%M"),
+                    ))
+
+            return captures
+        except Exception:
+            return []
+
+    def _open_dashboard(self):
+        self.app._open_dashboard()
+
+    def _open_settings(self):
+        """Open settings dialog."""
+        self.app.platform.open_settings_dialog(self.app.config)
+
+    def _quit(self):
+        self.app.quit()
+
+    def _view_capture(self, path: str):
+        """View a capture."""
+        import subprocess
+        subprocess.run(["openadapt", "capture", "view", path])
+
+    def _delete_capture(self, path: str):
+        """Delete a capture after confirmation."""
+        if self.app.platform.confirm_dialog(
+            "Delete Capture",
+            f"Are you sure you want to delete this capture?\n{path}"
+        ):
+            import shutil
+            shutil.rmtree(path)
+            self.app.notifications.show("Capture Deleted", "The capture has been removed.")
+
+    def _open_captures_list(self):
+        """Open captures list in dashboard."""
+        import webbrowser
+        webbrowser.open(f"http://localhost:{self.app.config.dashboard_port}/captures")
+
+    def _open_training_dashboard(self):
+        """Open training dashboard."""
+        import webbrowser
+        webbrowser.open(f"http://localhost:{self.app.config.dashboard_port}/training")
+
+    def _start_training(self):
+        """Open training configuration dialog."""
+        # This would open a dialog to select capture and model
+        self.app.platform.open_training_dialog()
+
+    def _stop_training(self):
+        """Stop current training."""
+        import subprocess
+        subprocess.run(["openadapt", "train", "stop"])
+        self.app.state.transition(TrayState.IDLE)
+
+    def _view_training_results(self):
+        """View last training results."""
+        import subprocess
+        subprocess.run(["openadapt", "train", "status"])
+```
+
+#### 4. Global Hotkeys (`shortcuts.py`)
+
+```python
+from dataclasses import dataclass
+from typing import Callable, Dict, Optional
+import threading
+
+from pynput import keyboard
+
+@dataclass
+class HotkeyConfig:
+    """Hotkey configuration."""
+    toggle_recording: str = "<ctrl>+<shift>+r"
+    open_dashboard: str = "<ctrl>+<shift>+d"
+    stop_recording: str = "<ctrl>+<ctrl>+<ctrl>"  # Triple ctrl (legacy compat)
+
+class HotkeyManager:
+    """Manages global hotkeys."""
+
+    def __init__(self, config: Optional[HotkeyConfig] = None):
+        self.config = config or HotkeyConfig()
+        self._handlers: Dict[str, Callable] = {}
+        self._listener: Optional[keyboard.GlobalHotKeys] = None
+        self._ctrl_count = 0
+        self._ctrl_timer: Optional[threading.Timer] = None
+
+    def register(self, hotkey: str, handler: Callable):
+        """Register a hotkey handler."""
+        self._handlers[hotkey] = handler
+
+    def start(self):
+        """Start listening for hotkeys."""
+        # Build hotkey dict for pynput
+        hotkeys = {}
+        for combo, handler in self._handlers.items():
+            if combo == "<ctrl>+<ctrl>+<ctrl>":
+                # Special handling for triple-ctrl
+                continue
+            hotkeys[combo] = handler
+
+        self._listener = keyboard.GlobalHotKeys(hotkeys)
+        self._listener.start()
+
+        # Also listen for triple-ctrl pattern
+        if "<ctrl>+<ctrl>+<ctrl>" in self._handlers:
+            self._start_ctrl_listener()
+
+    def _start_ctrl_listener(self):
+        """Start listener for triple-ctrl pattern."""
+        def on_press(key):
+            if key == keyboard.Key.ctrl_l or key == keyboard.Key.ctrl_r:
+                self._on_ctrl_press()
+
+        def on_release(key):
+            pass
+
+        self._key_listener = keyboard.Listener(
+            on_press=on_press,
+            on_release=on_release,
+        )
+        self._key_listener.start()
+
+    def _on_ctrl_press(self):
+        """Handle ctrl key press for triple-ctrl detection."""
+        self._ctrl_count += 1
+
+        # Reset timer
+        if self._ctrl_timer:
+            self._ctrl_timer.cancel()
+
+        if self._ctrl_count >= 3:
+            self._ctrl_count = 0
+            handler = self._handlers.get("<ctrl>+<ctrl>+<ctrl>")
+            if handler:
+                handler()
+        else:
+            # Reset count after 500ms
+            self._ctrl_timer = threading.Timer(0.5, self._reset_ctrl_count)
+            self._ctrl_timer.start()
+
+    def _reset_ctrl_count(self):
+        self._ctrl_count = 0
+
+    def stop(self):
+        """Stop listening for hotkeys."""
+        if self._listener:
+            self._listener.stop()
+        if hasattr(self, '_key_listener'):
+            self._key_listener.stop()
+        if self._ctrl_timer:
+            self._ctrl_timer.cancel()
+```
+
+#### 5. Platform Abstraction (`platform/`)
+
+**Base class (`platform/base.py`):**
+
+```python
+from abc import ABC, abstractmethod
+from typing import Optional
+
+class PlatformHandler(ABC):
+    """Abstract base class for platform-specific functionality."""
+
+    @abstractmethod
+    def setup(self):
+        """Platform-specific setup."""
+        pass
+
+    @abstractmethod
+    def prompt_input(self, title: str, message: str) -> Optional[str]:
+        """Show input dialog and return user input."""
+        pass
+
+    @abstractmethod
+    def confirm_dialog(self, title: str, message: str) -> bool:
+        """Show confirmation dialog and return result."""
+        pass
+
+    @abstractmethod
+    def open_settings_dialog(self, config):
+        """Open settings dialog."""
+        pass
+
+    @abstractmethod
+    def open_training_dialog(self):
+        """Open training configuration dialog."""
+        pass
+
+    def setup_autostart(self, enabled: bool):
+        """Configure auto-start on login."""
+        pass
+```
+
+**macOS implementation (`platform/macos.py`):**
+
+```python
+import subprocess
+from typing import Optional
+
+from .base import PlatformHandler
+
+class MacOSHandler(PlatformHandler):
+    """macOS-specific functionality."""
+
+    def setup(self):
+        """Hide from Dock, show only in menu bar."""
+        try:
+            from AppKit import NSApplication, NSApplicationActivationPolicyAccessory
+            NSApplication.sharedApplication().setActivationPolicy_(
+                NSApplicationActivationPolicyAccessory
+            )
+        except ImportError:
+            pass  # AppKit not available
+
+    def prompt_input(self, title: str, message: str) -> Optional[str]:
+        """Show native macOS input dialog."""
+        script = f'''
+        tell application "System Events"
+            display dialog "{message}" default answer "" with title "{title}"
+            return text returned of result
+        end tell
+        '''
+        try:
+            result = subprocess.run(
+                ["osascript", "-e", script],
+                capture_output=True,
+                text=True,
+            )
+            if result.returncode == 0:
+                return result.stdout.strip()
+        except Exception:
+            pass
+        return None
+
+    def confirm_dialog(self, title: str, message: str) -> bool:
+        """Show native macOS confirmation dialog."""
+        script = f'''
+        tell application "System Events"
+            display dialog "{message}" with title "{title}" buttons {{"Cancel", "OK"}} default button "OK"
+            return button returned of result
+        end tell
+        '''
+        try:
+            result = subprocess.run(
+                ["osascript", "-e", script],
+                capture_output=True,
+                text=True,
+            )
+            return result.returncode == 0 and "OK" in result.stdout
+        except Exception:
+            return False
+
+    def open_settings_dialog(self, config):
+        """Open settings in default browser."""
+        import webbrowser
+        webbrowser.open(f"http://localhost:{config.dashboard_port}/settings")
+
+    def open_training_dialog(self):
+        """Open training dialog in browser."""
+        import webbrowser
+        webbrowser.open("http://localhost:8080/training/new")
+
+    def setup_autostart(self, enabled: bool):
+        """Configure Launch Agent for auto-start."""
+        import os
+        from pathlib import Path
+
+        plist_path = Path.home() / "Library/LaunchAgents/ai.openadapt.tray.plist"
+
+        if enabled:
+            plist_content = '''<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>ai.openadapt.tray</string>
+    <key>ProgramArguments</key>
+    <array>
+        <string>/usr/local/bin/openadapt-tray</string>
+    </array>
+    <key>RunAtLoad</key>
+    <true/>
+    <key>KeepAlive</key>
+    <false/>
+</dict>
+</plist>'''
+            plist_path.parent.mkdir(parents=True, exist_ok=True)
+            plist_path.write_text(plist_content)
+            subprocess.run(["launchctl", "load", str(plist_path)])
+        else:
+            if plist_path.exists():
+                subprocess.run(["launchctl", "unload", str(plist_path)])
+                plist_path.unlink()
+```
+
+**Windows implementation (`platform/windows.py`):**
+
+```python
+import ctypes
+from typing import Optional
+
+from .base import PlatformHandler
+
+class WindowsHandler(PlatformHandler):
+    """Windows-specific functionality."""
+
+    def setup(self):
+        """Windows-specific setup."""
+        pass  # No special setup needed
+
+    def prompt_input(self, title: str, message: str) -> Optional[str]:
+        """Show Windows input dialog using ctypes."""
+        try:
+            import tkinter as tk
+            from tkinter import simpledialog
+
+            root = tk.Tk()
+            root.withdraw()
+            result = simpledialog.askstring(title, message)
+            root.destroy()
+            return result
+        except Exception:
+            return None
+
+    def confirm_dialog(self, title: str, message: str) -> bool:
+        """Show Windows confirmation dialog."""
+        MB_OKCANCEL = 0x01
+        MB_ICONQUESTION = 0x20
+        IDOK = 1
+
+        result = ctypes.windll.user32.MessageBoxW(
+            0, message, title, MB_OKCANCEL | MB_ICONQUESTION
+        )
+        return result == IDOK
+
+    def open_settings_dialog(self, config):
+        import webbrowser
+        webbrowser.open(f"http://localhost:{config.dashboard_port}/settings")
+
+    def open_training_dialog(self):
+        import webbrowser
+        webbrowser.open("http://localhost:8080/training/new")
+
+    def setup_autostart(self, enabled: bool):
+        """Configure Windows Registry for auto-start."""
+        import winreg
+
+        key_path = r"Software\Microsoft\Windows\CurrentVersion\Run"
+        app_name = "OpenAdapt"
+
+        try:
+            key = winreg.OpenKey(winreg.HKEY_CURRENT_USER, key_path, 0, winreg.KEY_ALL_ACCESS)
+
+            if enabled:
+                import sys
+                exe_path = sys.executable.replace("python.exe", "Scripts\\openadapt-tray.exe")
+                winreg.SetValueEx(key, app_name, 0, winreg.REG_SZ, exe_path)
+            else:
+                try:
+                    winreg.DeleteValue(key, app_name)
+                except FileNotFoundError:
+                    pass
+
+            winreg.CloseKey(key)
+        except Exception:
+            pass
+```
+
+#### 6. Configuration (`config.py`)
+
+```python
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Optional
+import json
+
+from openadapt_tray.shortcuts import HotkeyConfig
+
+@dataclass
+class TrayConfig:
+    """Tray application configuration."""
+
+    # Hotkeys
+    hotkeys: HotkeyConfig = field(default_factory=HotkeyConfig)
+
+    # Paths
+    captures_directory: str = "~/openadapt/captures"
+    training_output_directory: str = "~/openadapt/training"
+
+    # Dashboard
+    dashboard_port: int = 8080
+    auto_launch_dashboard: bool = True
+
+    # Behavior
+    auto_start_on_login: bool = False
+    minimize_to_tray: bool = True
+    show_notifications: bool = True
+    notification_duration_ms: int = 5000
+
+    # Recording
+    default_record_audio: bool = True
+    default_transcribe: bool = True
+    stop_on_triple_ctrl: bool = True
+
+    # Appearance
+    use_native_dialogs: bool = True
+
+    @classmethod
+    def config_path(cls) -> Path:
+        """Get configuration file path."""
+        return Path.home() / ".config" / "openadapt" / "tray.json"
+
+    @classmethod
+    def load(cls) -> "TrayConfig":
+        """Load configuration from file."""
+        path = cls.config_path()
+        if path.exists():
+            try:
+                data = json.loads(path.read_text())
+                hotkeys_data = data.pop("hotkeys", {})
+                return cls(
+                    hotkeys=HotkeyConfig(**hotkeys_data),
+                    **data
+                )
+            except Exception:
+                pass
+        return cls()
+
+    def save(self):
+        """Save configuration to file."""
+        path = self.config_path()
+        path.parent.mkdir(parents=True, exist_ok=True)
+
+        data = {
+            "hotkeys": {
+                "toggle_recording": self.hotkeys.toggle_recording,
+                "open_dashboard": self.hotkeys.open_dashboard,
+                "stop_recording": self.hotkeys.stop_recording,
+            },
+            "captures_directory": self.captures_directory,
+            "training_output_directory": self.training_output_directory,
+            "dashboard_port": self.dashboard_port,
+            "auto_launch_dashboard": self.auto_launch_dashboard,
+            "auto_start_on_login": self.auto_start_on_login,
+            "minimize_to_tray": self.minimize_to_tray,
+            "show_notifications": self.show_notifications,
+            "notification_duration_ms": self.notification_duration_ms,
+            "default_record_audio": self.default_record_audio,
+            "default_transcribe": self.default_transcribe,
+            "stop_on_triple_ctrl": self.stop_on_triple_ctrl,
+            "use_native_dialogs": self.use_native_dialogs,
+        }
+
+        path.write_text(json.dumps(data, indent=2))
+```
+
+#### 7. Notifications (`notifications.py`)
+
+```python
+import sys
+from typing import Optional
+
+class NotificationManager:
+    """Cross-platform notification manager."""
+
+    def __init__(self):
+        self._backend = self._detect_backend()
+
+    def _detect_backend(self) -> str:
+        """Detect best notification backend for platform."""
+        if sys.platform == "darwin":
+            return "macos"
+        elif sys.platform == "win32":
+            return "windows"
+        else:
+            return "linux"
+
+    def show(
+        self,
+        title: str,
+        body: str,
+        icon_path: Optional[str] = None,
+        duration_ms: int = 5000,
+    ):
+        """Show a notification."""
+        if self._backend == "macos":
+            self._show_macos(title, body)
+        elif self._backend == "windows":
+            self._show_windows(title, body, icon_path, duration_ms)
+        else:
+            self._show_linux(title, body, icon_path)
+
+    def _show_macos(self, title: str, body: str):
+        """Show notification on macOS."""
+        import subprocess
+        script = f'''
+        display notification "{body}" with title "{title}"
+        '''
+        subprocess.run(["osascript", "-e", script], capture_output=True)
+
+    def _show_windows(self, title: str, body: str, icon_path: Optional[str], duration_ms: int):
+        """Show notification on Windows using pystray's built-in notify."""
+        # pystray handles this via icon.notify()
+        pass
+
+    def _show_linux(self, title: str, body: str, icon_path: Optional[str]):
+        """Show notification on Linux."""
+        try:
+            import subprocess
+            cmd = ["notify-send", title, body]
+            if icon_path:
+                cmd.extend(["-i", icon_path])
+            subprocess.run(cmd, capture_output=True)
+        except Exception:
+            pass
+```
+
+### pyproject.toml
+
+```toml
+[project]
+name = "openadapt-tray"
+version = "0.1.0"
+description = "System tray application for OpenAdapt"
+readme = "README.md"
+requires-python = ">=3.10"
+license = "MIT"
+authors = [
+    {name = "MLDSAI Inc.", email = "richard@mldsai.com"}
+]
+keywords = ["gui", "system-tray", "menu-bar", "openadapt"]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: MacOS",
+    "Operating System :: Microsoft :: Windows",
+    "Operating System :: POSIX :: Linux",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+]
+
+dependencies = [
+    "pystray>=0.19.0",
+    "Pillow>=9.0.0",
+    "pynput>=1.7.0",
+    "click>=8.0.0",
+]
+
+[project.optional-dependencies]
+macos-native = [
+    "rumps>=0.4.0",
+    "pyobjc-framework-Cocoa>=9.0",
+]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-mock>=3.10.0",
+    "ruff>=0.1.0",
+]
+all = [
+    "openadapt-tray[macos-native]",
+]
+
+[project.scripts]
+openadapt-tray = "openadapt_tray.app:main"
+
+[project.gui-scripts]
+openadapt-tray-gui = "openadapt_tray.app:main"
+
+[project.urls]
+Homepage = "https://openadapt.ai"
+Documentation = "https://docs.openadapt.ai"
+Repository = "https://github.com/OpenAdaptAI/openadapt-tray"
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/openadapt_tray"]
+
+[tool.ruff]
+line-length = 88
+target-version = "py310"
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+```
+
+## User Experience
+
+### First-Run Experience
+
+1. **Installation**: `pip install openadapt-tray`
+2. **Launch**: `openadapt-tray` or via Applications menu
+3. **First Run Dialog** (if no config exists):
+   - Welcome message
+   - Option to configure hotkeys
+   - Option to enable auto-start
+   - Link to documentation
+4. **Tray Icon**: Appears in system tray/menu bar
+5. **Dashboard**: Auto-opens (configurable)
+
+### Menu Structure
+
+```
+[OpenAdapt Tray Icon]
+├── Start Recording (Ctrl+Shift+R)
+│   └── [When recording: "Stop Recording (task-name)"]
+├── ─────────────
+├── Recent Captures
+│   ├── login-flow (2024-01-15 14:30)
+│   │   ├── View
+│   │   └── Delete
+│   ├── checkout (2024-01-15 10:15)
+│   │   ├── View
+│   │   └── Delete
+│   ├── ... (up to 10 items)
+│   ├── ─────────────
+│   └── View All...
+├── Training
+│   ├── Start Training...
+│   └── View Last Results
+│   └── [When training: "Training: 45% | View Progress | Stop"]
+├── ─────────────
+├── Open Dashboard (Ctrl+Shift+D)
+├── Settings...
+├── ─────────────
+└── Quit
+```
+
+### Status Icons
+
+| State | Icon Description | Color |
+|-------|------------------|-------|
+| Idle | OpenAdapt logo | Blue/Gray |
+| Recording | Pulsing red dot overlay | Red |
+| Recording Starting | Spinning indicator | Yellow |
+| Training | Gear icon | Purple |
+| Error | Exclamation mark | Red |
+
+### Keyboard Shortcuts
+
+| Action | Default Shortcut | Configurable |
+|--------|------------------|--------------|
+| Toggle Recording | `Ctrl+Shift+R` | Yes |
+| Open Dashboard | `Ctrl+Shift+D` | Yes |
+| Stop Recording | `Ctrl Ctrl Ctrl` (triple tap) | Yes |
+
+### Notifications
+
+| Event | Title | Body |
+|-------|-------|------|
+| Recording Started | "Recording Started" | "Capturing: {task-name}" |
+| Recording Stopped | "Recording Stopped" | "Capture saved" |
+| Training Started | "Training Started" | "Model training in progress" |
+| Training Complete | "Training Complete" | "Model saved to {path}" |
+| Error | "Error" | "{error message}" |
+
+## Integration with Ecosystem
+
+### CLI Integration
+
+The tray app delegates to the `openadapt` CLI for all operations:
+
+```python
+# Starting a capture
+subprocess.Popen(["openadapt", "capture", "start", "--name", name])
+
+# Stopping a capture
+subprocess.Popen(["openadapt", "capture", "stop"])
+
+# Starting training
+subprocess.Popen(["openadapt", "train", "start", "--capture", capture_path])
+
+# Checking training status
+result = subprocess.run(["openadapt", "train", "status"], capture_output=True)
+```
+
+### Direct API Integration (Alternative)
+
+For tighter integration, the tray can import sub-packages directly:
+
+```python
+try:
+    from openadapt_capture import CaptureSession
+
+    session = CaptureSession(name=name, record_audio=True)
+    session.start()
+except ImportError:
+    # Fall back to CLI
+    subprocess.Popen(["openadapt", "capture", "start", "--name", name])
+```
+
+### Dashboard Integration
+
+- Auto-launches the dashboard web server on startup (configurable)
+- "Open Dashboard" opens browser to `http://localhost:8080`
+- Settings page accessible via tray menu
+
+## Future Enhancements
+
+1. **Native macOS app** using `rumps` for a more native feel
+2. **Electron wrapper** for consistent cross-platform UI
+3. **Recording preview** - show recent screenshot in menu
+4. **Quick actions** - right-click for immediate actions
+5. **Status bar text** - show recording duration on macOS
+6. **Multi-monitor support** - select which monitor to record
+7. **Cloud sync** - sync captures and settings across devices
+8. **Plugin system** - allow third-party menu extensions
+
+## Migration from Legacy
+
+### Compatibility
+
+The new tray app maintains backward compatibility with:
+- Legacy stop sequences (`oa.stop`, triple-ctrl)
+- PostHog analytics events
+- Configuration file locations
+
+### Migration Path
+
+1. Install `openadapt-tray` alongside legacy
+2. Both can coexist (different process names)
+3. Legacy can be deprecated when new tray is stable
+4. Configuration migration script provided
+
+---
+
+*This design enables a lightweight, cross-platform system tray experience while maintaining integration with the OpenAdapt ecosystem's CLI-first architecture.*
diff --git a/docs/design/repo-rename-analysis.md b/docs/design/repo-rename-analysis.md
new file mode 100644
index 000000000..e66dac023
--- /dev/null
+++ b/docs/design/repo-rename-analysis.md
@@ -0,0 +1,286 @@
+# Repository Rename Analysis: OpenAdapt to openadapt
+
+**Date:** January 2026
+**Status:** Decision Document
+**Author:** Engineering Team
+
+---
+
+## Executive Summary
+
+This document analyzes whether to rename the main OpenAdapt GitHub repository from `OpenAdapt` (mixed case) to `openadapt` (lowercase) to align with Python conventions and existing sub-packages.
+
+**Recommendation: DO NOT RENAME at this time.**
+
+The costs and risks of renaming outweigh the benefits. The minor consistency improvement does not justify the potential for broken links, documentation updates, and brand dilution.
+
+---
+
+## Current State
+
+| Component | Current Name | Case |
+|-----------|-------------|------|
+| **Main Repository** | `OpenAdaptAI/OpenAdapt` | Mixed |
+| **GitHub Organization** | `OpenAdaptAI` | Mixed |
+| **Sub-packages** | `openadapt-ml`, `openadapt-capture`, etc. | Lowercase |
+| **PyPI Package** | `openadapt` | Lowercase |
+| **Python Imports** | `import openadapt` | Lowercase |
+| **pyproject.toml Repository URL** | Already points to `openadapt` (lowercase) | Lowercase |
+
+**Key Observation:** The `pyproject.toml` already uses lowercase in the Repository URL:
+```toml
+Repository = "https://github.com/OpenAdaptAI/openadapt"
+```
+
+This suggests the team anticipated or intended lowercase naming, but GitHub currently shows `OpenAdapt`.
+
+---
+
+## Industry Research: How Major Python Projects Handle Repository Naming
+
+| Project | Organization | Repository | PyPI Package | Notes |
+|---------|-------------|------------|--------------|-------|
+| **LangChain** | `langchain-ai` | `langchain` | `langchain` | All lowercase |
+| **PyTorch** | `pytorch` | `pytorch` | `torch` | All lowercase |
+| **TensorFlow** | `tensorflow` | `tensorflow` | `tensorflow` | All lowercase |
+| **Hugging Face** | `huggingface` | `transformers` | `transformers` | All lowercase |
+| **FastAPI** | `tiangolo` | `fastapi` | `fastapi` | All lowercase |
+| **scikit-learn** | `scikit-learn` | `scikit-learn` | `scikit-learn` | All lowercase with hyphen |
+
+**Conclusion:** The overwhelming convention in Python open-source projects is **all lowercase** for repository names.
+
+---
+
+## GitHub Redirect Behavior
+
+Based on [GitHub's documentation](https://docs.github.com/en/repositories/creating-and-managing-repositories/renaming-a-repository):
+
+### What Gets Redirected (Indefinitely)
+- Web traffic to the old URL
+- `git clone`, `git fetch`, `git push` operations
+- Issues, wikis, stars, followers
+
+### What Breaks Immediately
+- **GitHub Actions** referencing the repository by name will fail with "repository not found"
+- **GitHub Pages** custom domain URLs are not automatically redirected
+
+### Redirect Persistence
+- Redirects persist **indefinitely** unless:
+  1. A new repository is created with the old name
+  2. GitHub support is asked to remove them
+
+### Important Warning
+From [GitHub Community discussions](https://github.com/orgs/community/discussions/22669): "If you create a new repository under your account in the future, do not reuse the original name of the renamed repository. If you do, redirects to the renamed repository will no longer work."
+
+---
+
+## Detailed Analysis
+
+### Arguments FOR Renaming to Lowercase
+
+| Argument | Weight | Rationale |
+|----------|--------|-----------|
+| **Consistency with sub-packages** | Medium | All sub-packages use lowercase (`openadapt-ml`, `openadapt-capture`, etc.) |
+| **Python convention** | Medium | Standard practice in Python ecosystem (see industry research) |
+| **PyPI alignment** | Medium | Package name is `openadapt` (lowercase) |
+| **Import alignment** | Low | `import openadapt` works regardless of repo name |
+| **URL simplicity** | Low | `github.com/OpenAdaptAI/openadapt` slightly cleaner |
+| **Already in pyproject.toml** | High | Repository URL already shows lowercase intent |
+
+### Arguments AGAINST Renaming
+
+| Argument | Weight | Rationale |
+|----------|--------|-----------|
+| **Brand recognition** | High | "OpenAdapt" as two words (Open + Adapt) reinforces brand identity |
+| **Breaking changes risk** | High | External links, bookmarks, documentation, blog posts, academic citations |
+| **GitHub org inconsistency** | Medium | Organization is `OpenAdaptAI` (mixed case) - renaming repo creates inconsistency |
+| **Documentation updates** | Medium | 1,343 occurrences of "OpenAdapt" across 78 files need review |
+| **SEO impact** | Medium | Existing search rankings tied to "OpenAdapt" |
+| **Minimal actual benefit** | High | GitHub URLs are case-insensitive for access purposes |
+| **Legacy code references** | Medium | Legacy directory has extensive "OpenAdapt" references |
+
+---
+
+## Technical Impact Assessment
+
+### Files Requiring Updates if Renamed
+
+Based on codebase analysis:
+
+| Category | File Count | Occurrences | Update Required? |
+|----------|------------|-------------|------------------|
+| Documentation (*.md) | 37 | ~200+ | Review each |
+| GitHub workflows (*.yml) | 10 | ~50+ | Critical review |
+| Python source files | 15 | ~50+ | Review imports |
+| Configuration files | 5 | ~20+ | Review URLs |
+| Legacy code | 20+ | ~900+ | May leave as-is |
+
+### CI/CD Impact
+
+Current workflows use relative paths and don't hard-code the repository name, so **minimal CI/CD impact expected**.
+
+However, any external workflows or actions referencing `OpenAdaptAI/OpenAdapt` would need updates.
+
+### Impact on Forks and Clones
+
+- **Existing clones:** Continue working via redirects, but should update with `git remote set-url`
+- **Existing forks:** Maintain their existing names and remotes
+- **New forks:** Would fork from the new lowercase name
+
+---
+
+## Risk Assessment
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|------------|--------|------------|
+| Broken external links | Medium | Medium | GitHub redirects handle most cases |
+| Academic citation issues | Low | Medium | Papers cite DOIs or specific versions |
+| SEO ranking drop | Low | Low | Temporary if any; redirects preserve link equity |
+| User confusion | Medium | Low | Clear communication and documentation |
+| GitHub Actions failures | Low | High | Audit and update before rename |
+| Brand dilution | Medium | Medium | None - cannot mitigate if lowercase chosen |
+
+---
+
+## Alternative Approaches
+
+### Option A: Do Nothing (RECOMMENDED)
+- Keep repository as `OpenAdapt`
+- Accept minor inconsistency with sub-packages
+- No risk, no disruption
+
+### Option B: Rename to Lowercase
+- Change repository to `openadapt`
+- Update documentation
+- Communicate to users
+- Accept brand/visual trade-off
+
+### Option C: Rename Organization and Repository
+- Change `OpenAdaptAI` to `openadaptai`
+- Change `OpenAdapt` to `openadapt`
+- Complete consistency, but much higher disruption
+- **NOT RECOMMENDED** - organization rename is significantly more disruptive
+
+### Option D: Create Alias via Transfer
+- Transfer repository to a new `openadapt` repo
+- Keep `OpenAdapt` as a redirect-only stub
+- **NOT RECOMMENDED** - unnecessarily complex
+
+---
+
+## Recommendation
+
+**Recommendation: Do Not Rename (Option A)**
+
+### Rationale
+
+1. **GitHub URLs are case-insensitive** - Users can access via `github.com/OpenAdaptAI/openadapt` or `github.com/openadaptai/OpenAdapt` interchangeably
+
+2. **Brand value** - "OpenAdapt" with capitalization clearly shows the "Open" + "Adapt" word composition, which is meaningful for the project's identity
+
+3. **Risk/benefit ratio** - The benefits are cosmetic while the risks (broken links, confusion, documentation churn) are concrete
+
+4. **Organization inconsistency** - Renaming only the repo while keeping `OpenAdaptAI` creates a new inconsistency
+
+5. **Industry examples** - While most Python projects use lowercase, several successful projects (like early versions of major projects) maintained mixed-case names without issue
+
+6. **pyproject.toml already lowercase** - The `Repository` URL in `pyproject.toml` already shows lowercase, providing implicit consistency for programmatic access
+
+---
+
+## If Renaming is Chosen: Migration Plan
+
+Should the decision be made to rename despite the recommendation, here is the migration plan:
+
+### Phase 1: Preparation (1 week before)
+1. Audit all GitHub Actions and CI/CD workflows
+2. Document all external references (blog posts, papers, etc.)
+3. Prepare communication for Discord and mailing lists
+4. Create redirect documentation
+
+### Phase 2: Execution (Day of)
+1. Perform the rename via GitHub Settings
+2. Update `pyproject.toml` repository URL (if needed)
+3. Update README.md badge URLs
+4. Push updated documentation
+
+### Phase 3: Communication (Day of + 1 week)
+1. Announce on Discord
+2. Post on social media
+3. Email contributors
+4. Update any linked resources
+
+### Phase 4: Follow-up (1 month)
+1. Monitor for broken links
+2. Update external documentation (readthedocs, etc.)
+3. Check Google Search Console for indexing issues
+
+---
+
+## Timeline
+
+| Milestone | Date | Notes |
+|-----------|------|-------|
+| Decision | TBD | Pending team discussion |
+| If renaming: Preparation | T+0 to T+7 days | Audit and documentation |
+| If renaming: Execution | T+7 days | Actual rename |
+| If renaming: Stabilization | T+7 to T+30 days | Monitor and fix issues |
+
+---
+
+## Conclusion
+
+While lowercase repository naming is the Python convention and would create better consistency with sub-packages, the **costs outweigh the benefits** for the main OpenAdapt repository. The recommendation is to **keep the current `OpenAdapt` naming** for the following key reasons:
+
+1. Brand recognition and identity
+2. Risk of breaking external references
+3. GitHub URLs are case-insensitive anyway
+4. Organization name would remain inconsistent regardless
+5. The `pyproject.toml` already uses lowercase, providing programmatic consistency
+
+If consistency is deemed critical in the future, consider renaming the organization and all repositories together as a single coordinated effort, rather than piecemeal changes.
+
+---
+
+## References
+
+- [GitHub: Renaming a repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/renaming-a-repository)
+- [GitHub Community: How long does GitHub forward renamed repos?](https://github.com/orgs/community/discussions/22669)
+- [GitHub Community: Duration of Web Traffic Redirection](https://github.com/orgs/community/discussions/110367)
+- [LangChain GitHub](https://github.com/langchain-ai/langchain)
+- [Hugging Face Transformers](https://github.com/huggingface/transformers)
+
+---
+
+## Appendix A: Files Containing "OpenAdapt" References
+
+Key files with the highest occurrence counts:
+
+| File | Count | Notes |
+|------|-------|-------|
+| `legacy/CHANGELOG.md` | 911 | Historical, may leave unchanged |
+| `README.md` | 21 | Brand mentions, badges |
+| `docs/contributing.md` | 18 | Contribution guidelines |
+| `legacy/build.py` | 19 | Build scripts |
+| `docs/design/landing-page-strategy.md` | 20 | Strategy document |
+| `docs/architecture-evolution.md` | 14 | Architecture docs |
+
+Total: **1,343 occurrences across 78 files**
+
+---
+
+## Appendix B: Sub-package Repository Naming
+
+All sub-packages follow lowercase convention:
+
+| Repository | PyPI Package |
+|------------|--------------|
+| `openadapt-capture` | `openadapt-capture` |
+| `openadapt-ml` | `openadapt-ml` |
+| `openadapt-evals` | `openadapt-evals` |
+| `openadapt-viewer` | `openadapt-viewer` |
+| `openadapt-grounding` | `openadapt-grounding` |
+| `openadapt-retrieval` | `openadapt-retrieval` |
+| `openadapt-privacy` | `openadapt-privacy` |
+
+This consistency is desirable but not critical enough to justify renaming the main repository.
diff --git a/docs/design/telemetry-design.md b/docs/design/telemetry-design.md
new file mode 100644
index 000000000..cd0ecc343
--- /dev/null
+++ b/docs/design/telemetry-design.md
@@ -0,0 +1,895 @@
+# Telemetry Design for OpenAdapt Packages
+
+## Overview
+
+This document outlines the design for adding optional telemetry to all OpenAdapt packages. The system is designed to be:
+
+- **Opt-in by default** (or easily disabled)
+- **Privacy-respecting** (no PII, no screenshots, minimal data)
+- **Developer-aware** (internal usage tagged for filtering)
+- **Unified** (shared module across all packages)
+
+## Table of Contents
+
+1. [Service Recommendation](#service-recommendation)
+2. [Architecture](#architecture)
+3. [Implementation Approach](#implementation-approach)
+4. [Configuration Options](#configuration-options)
+5. [Privacy Considerations](#privacy-considerations)
+6. [Internal Usage Tagging](#internal-usage-tagging)
+7. [Code Examples](#code-examples)
+8. [Migration Plan](#migration-plan)
+9. [References](#references)
+
+---
+
+## Service Recommendation
+
+### Recommendation: GlitchTip (Self-Hosted) + Sentry SDK
+
+After evaluating both options, we recommend **continuing with GlitchTip** (already in use in the legacy codebase) with the Sentry Python SDK.
+
+### Comparison
+
+| Feature | GlitchTip | Sentry |
+|---------|-----------|--------|
+| **Pricing** | Free (self-hosted) or $15/mo (100K errors) | Free tier limited, paid plans start higher |
+| **Self-Hosting** | Simple (4 components: backend, workers, Redis, PostgreSQL) | Complex (12+ components including Kafka, Zookeeper, ClickHouse) |
+| **Resource Requirements** | Minimal (1GB RAM, 1 CPU core) | Heavy (requires significant infrastructure) |
+| **SDK Compatibility** | Uses Sentry SDK (drop-in compatible) | Native SDK |
+| **Open Source** | Fully open source | Partially open source |
+| **Features** | Error tracking, uptime monitoring, basic performance | Full APM, session replay, distributed tracing |
+| **Privacy** | Self-hosted = full data control | Cloud = data sent to Sentry servers |
+
+### Rationale
+
+1. **Existing Integration**: The legacy OpenAdapt codebase already uses GlitchTip (DSN: `app.glitchtip.com`)
+2. **Privacy-First**: Self-hosting ensures complete control over sensitive automation data
+3. **Cost-Effective**: Free for self-hosted or very affordable cloud option
+4. **SDK Compatibility**: Uses the battle-tested Sentry Python SDK
+5. **Simplicity**: Easier to deploy and maintain than self-hosted Sentry
+6. **Open Source Alignment**: Matches OpenAdapt's open-source philosophy
+
+### GlitchTip Cloud vs Self-Hosted
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **Cloud (glitchtip.com)** | Zero maintenance, instant setup | Monthly cost, data leaves your infrastructure |
+| **Self-Hosted** | Free, full data control, customizable | Requires server, maintenance overhead |
+
+**Recommendation**: Start with GlitchTip Cloud for simplicity, migrate to self-hosted if needed.
+
+---
+
+## Architecture
+
+### Shared Telemetry Module
+
+We propose a new package `openadapt-telemetry` that provides a unified telemetry interface for all OpenAdapt packages.
+
+```
+openadapt-telemetry/
+├── src/openadapt_telemetry/
+│   ├── __init__.py           # Public API exports
+│   ├── config.py             # Configuration management
+│   ├── client.py             # Telemetry client (Sentry wrapper)
+│   ├── events.py             # Event types and helpers
+│   ├── privacy.py            # PII filtering and scrubbing
+│   └── decorators.py         # Convenience decorators
+└── pyproject.toml
+```
+
+### Package Integration
+
+```mermaid
+graph TD
+    subgraph Packages["OpenAdapt Packages"]
+        CAP[openadapt-capture]
+        ML[openadapt-ml]
+        EVL[openadapt-evals]
+        VWR[openadapt-viewer]
+        GRD[openadapt-grounding]
+        RET[openadapt-retrieval]
+        PRV[openadapt-privacy]
+    end
+
+    subgraph Telemetry["Telemetry Layer"]
+        TEL[openadapt-telemetry]
+        CONFIG[Config Manager]
+        FILTER[Privacy Filter]
+    end
+
+    subgraph Backend["Backend"]
+        GT[GlitchTip]
+    end
+
+    CAP --> TEL
+    ML --> TEL
+    EVL --> TEL
+    VWR --> TEL
+    GRD --> TEL
+    RET --> TEL
+    PRV --> TEL
+
+    TEL --> CONFIG
+    TEL --> FILTER
+    TEL --> GT
+```
+
+---
+
+## Implementation Approach
+
+### Option A: Shared Package (Recommended)
+
+Create `openadapt-telemetry` as a dependency for all packages.
+
+**Pros:**
+- Single source of truth for telemetry logic
+- Consistent behavior across all packages
+- Easy to update and maintain
+- Centralized privacy controls
+
+**Cons:**
+- Additional dependency
+- Version coordination required
+
+### Option B: Per-Package Implementation
+
+Each package implements its own telemetry.
+
+**Pros:**
+- Package independence
+- No cross-package dependencies
+
+**Cons:**
+- Code duplication
+- Inconsistent implementations
+- Harder to maintain privacy controls
+
+### Decision: Option A (Shared Package)
+
+The shared package approach aligns with the meta-package architecture and ensures consistency.
+
+---
+
+## Configuration Options
+
+### Environment Variables
+
+```bash
+# Primary opt-out mechanism (industry standard)
+OPENADAPT_TELEMETRY_ENABLED=false     # Disable all telemetry
+DO_NOT_TRACK=1                         # Universal opt-out (alternative)
+
+# Internal/developer mode
+OPENADAPT_INTERNAL=true               # Tag as internal usage
+OPENADAPT_DEV=true                    # Development mode (alternative)
+
+# Configuration overrides
+OPENADAPT_TELEMETRY_DSN=<dsn>         # Custom DSN
+OPENADAPT_TELEMETRY_ENVIRONMENT=dev   # Environment name
+OPENADAPT_TELEMETRY_SAMPLE_RATE=0.1   # Sampling rate (0.0-1.0)
+```
+
+### Configuration File
+
+```json
+// ~/.config/openadapt/telemetry.json
+{
+  "enabled": true,
+  "internal": false,
+  "dsn": null,
+  "environment": "production",
+  "sample_rate": 1.0,
+  "error_tracking": true,
+  "performance_tracking": false,
+  "feature_usage": true
+}
+```
+
+### Priority Order
+
+1. Environment variables (highest priority)
+2. Configuration file
+3. Package defaults (lowest priority)
+
+### Default Configuration
+
+```python
+DEFAULTS = {
+    "enabled": True,                    # Enabled by default, easy opt-out
+    "internal": False,                  # External user by default
+    "dsn": "https://xxx@app.glitchtip.com/XXXX",
+    "environment": "production",
+    "sample_rate": 1.0,                 # 100% for errors
+    "traces_sample_rate": 0.01,         # 1% for performance
+    "error_tracking": True,
+    "performance_tracking": True,
+    "feature_usage": True,
+    "send_default_pii": False,          # Never send PII by default
+}
+```
+
+---
+
+## Privacy Considerations
+
+### What We Collect (Ethical Data)
+
+| Category | Data Collected | Purpose |
+|----------|---------------|---------|
+| **Error Tracking** | Exception type, stack trace, error message | Bug fixing, stability monitoring |
+| **Performance** | Function timing, memory usage | Optimization, bottleneck detection |
+| **Feature Usage** | Feature names, operation counts | Prioritize development, understand needs |
+| **Environment** | OS, Python version, package versions | Compatibility testing, support |
+| **Session** | Anonymous session ID, duration | Usage patterns, engagement |
+
+### What We Never Collect
+
+| Category | Data NOT Collected | Reason |
+|----------|-------------------|--------|
+| **PII** | Names, emails, IP addresses | Privacy violation |
+| **Screenshots** | Screen captures, images | Highly sensitive |
+| **User Content** | Text typed, file contents | Privacy violation |
+| **Credentials** | API keys, passwords, tokens | Security risk |
+| **File Paths** | Full paths (especially with usernames) | PII leakage |
+| **Network Data** | URLs, request bodies | Sensitive information |
+| **Biometrics** | Mouse patterns, typing cadence | Privacy violation |
+
+### PII Scrubbing
+
+```python
+# Automatically scrubbed from all events
+PII_DENYLIST = [
+    "password",
+    "secret",
+    "token",
+    "api_key",
+    "authorization",
+    "cookie",
+    "session",
+    "email",
+    "phone",
+    "address",
+    "ssn",
+    "credit_card",
+]
+
+# Path sanitization
+def sanitize_path(path: str) -> str:
+    """Remove username from file paths."""
+    # /Users/john/code/file.py -> /Users/<user>/code/file.py
+    return re.sub(r'/Users/[^/]+/', '/Users/<user>/', path)
+```
+
+### GDPR Compliance
+
+1. **Consent**: Telemetry is opt-in or easily disabled
+2. **Data Minimization**: Collect only necessary data
+3. **Purpose Limitation**: Use only for stated purposes
+4. **Transparency**: Document what is collected
+5. **Right to Erasure**: Provide way to request data deletion
+6. **Data Protection**: Self-hosted option for full control
+
+---
+
+## Internal Usage Tagging
+
+### Tagging Strategy
+
+Internal OpenAdapt developers and testers should be tagged so their usage can be filtered out when analyzing real user behavior.
+
+### Detection Methods
+
+```python
+def is_internal_user() -> bool:
+    """Determine if current usage is from internal team."""
+
+    # Method 1: Explicit environment variable
+    if os.getenv("OPENADAPT_INTERNAL", "").lower() in ("true", "1", "yes"):
+        return True
+
+    # Method 2: Development environment
+    if os.getenv("OPENADAPT_DEV", "").lower() in ("true", "1", "yes"):
+        return True
+
+    # Method 3: Not running from executable (dev mode)
+    if not is_running_from_executable():
+        return True
+
+    # Method 4: Git repository present (development checkout)
+    if Path(".git").exists():
+        return True
+
+    # Method 5: Known internal email domain (if user identified)
+    # Note: Only if user voluntarily provided email
+
+    # Method 6: CI/CD environment
+    ci_env_vars = ["CI", "GITHUB_ACTIONS", "GITLAB_CI", "JENKINS_URL"]
+    if any(os.getenv(var) for var in ci_env_vars):
+        return True
+
+    return False
+```
+
+### Tag Application
+
+```python
+def get_telemetry_tags() -> dict:
+    """Get standard tags for all telemetry events."""
+    return {
+        "internal": is_internal_user(),
+        "environment": get_environment(),
+        "package_version": get_version(),
+        "python_version": platform.python_version(),
+        "os": platform.system(),
+        "os_version": platform.release(),
+    }
+```
+
+### Filtering in GlitchTip
+
+```
+# Filter out internal usage
+tag:internal IS false
+
+# View only internal usage
+tag:internal IS true
+
+# Combine with environment
+tag:environment IS production AND tag:internal IS false
+```
+
+---
+
+## Code Examples
+
+### Package Installation
+
+```toml
+# pyproject.toml for any OpenAdapt package
+[project]
+dependencies = [
+    "openadapt-telemetry>=0.1.0",
+]
+
+[project.optional-dependencies]
+# Telemetry is optional for those who want zero tracking
+minimal = []  # Install without telemetry
+```
+
+### Telemetry Client Implementation
+
+```python
+# src/openadapt_telemetry/client.py
+"""Telemetry client for OpenAdapt packages."""
+
+from __future__ import annotations
+
+import os
+import platform
+from functools import lru_cache
+from pathlib import Path
+from typing import Any, Callable, Optional
+
+import sentry_sdk
+from sentry_sdk.types import Event, Hint
+
+
+class TelemetryClient:
+    """Unified telemetry client for all OpenAdapt packages."""
+
+    _instance: Optional["TelemetryClient"] = None
+
+    def __init__(self):
+        self._initialized = False
+        self._enabled = self._check_enabled()
+        self._internal = self._check_internal()
+
+    @classmethod
+    def get_instance(cls) -> "TelemetryClient":
+        """Get singleton instance."""
+        if cls._instance is None:
+            cls._instance = cls()
+        return cls._instance
+
+    def _check_enabled(self) -> bool:
+        """Check if telemetry should be enabled."""
+        # Universal opt-out
+        if os.getenv("DO_NOT_TRACK", "").lower() in ("1", "true"):
+            return False
+
+        # Package-specific opt-out
+        if os.getenv("OPENADAPT_TELEMETRY_ENABLED", "").lower() in ("false", "0", "no"):
+            return False
+
+        return True
+
+    def _check_internal(self) -> bool:
+        """Check if this is internal usage."""
+        # Explicit flag
+        if os.getenv("OPENADAPT_INTERNAL", "").lower() in ("true", "1", "yes"):
+            return True
+
+        # Development mode
+        if os.getenv("OPENADAPT_DEV", "").lower() in ("true", "1", "yes"):
+            return True
+
+        # Git repo present (development checkout)
+        if Path(".git").exists():
+            return True
+
+        # CI environment
+        ci_vars = ["CI", "GITHUB_ACTIONS", "GITLAB_CI", "JENKINS_URL", "TRAVIS"]
+        if any(os.getenv(var) for var in ci_vars):
+            return True
+
+        return False
+
+    def initialize(
+        self,
+        dsn: Optional[str] = None,
+        package_name: str = "openadapt",
+        package_version: str = "unknown",
+        **kwargs,
+    ) -> None:
+        """Initialize the telemetry client."""
+        if not self._enabled:
+            return
+
+        if self._initialized:
+            return
+
+        dsn = dsn or os.getenv(
+            "OPENADAPT_TELEMETRY_DSN",
+            "https://xxx@app.glitchtip.com/XXXX"  # Default DSN
+        )
+
+        environment = os.getenv("OPENADAPT_TELEMETRY_ENVIRONMENT", "production")
+        sample_rate = float(os.getenv("OPENADAPT_TELEMETRY_SAMPLE_RATE", "1.0"))
+        traces_sample_rate = float(os.getenv("OPENADAPT_TELEMETRY_TRACES_SAMPLE_RATE", "0.01"))
+
+        sentry_sdk.init(
+            dsn=dsn,
+            environment=environment,
+            sample_rate=sample_rate,
+            traces_sample_rate=traces_sample_rate,
+            send_default_pii=False,
+            before_send=self._before_send,
+            before_send_transaction=self._before_send_transaction,
+            **kwargs,
+        )
+
+        # Set default tags
+        sentry_sdk.set_tag("internal", self._internal)
+        sentry_sdk.set_tag("package", package_name)
+        sentry_sdk.set_tag("package_version", package_version)
+        sentry_sdk.set_tag("python_version", platform.python_version())
+        sentry_sdk.set_tag("os", platform.system())
+        sentry_sdk.set_tag("os_version", platform.release())
+
+        self._initialized = True
+
+    def _before_send(self, event: Event, hint: Hint) -> Optional[Event]:
+        """Filter and sanitize events before sending."""
+        # Scrub PII from stack traces
+        if "exception" in event:
+            self._scrub_exception(event["exception"])
+
+        return event
+
+    def _before_send_transaction(self, event: Event, hint: Hint) -> Optional[Event]:
+        """Filter performance events."""
+        return event
+
+    def _scrub_exception(self, exception_data: dict) -> None:
+        """Remove PII from exception data."""
+        if "values" not in exception_data:
+            return
+
+        for value in exception_data["values"]:
+            if "stacktrace" in value and "frames" in value["stacktrace"]:
+                for frame in value["stacktrace"]["frames"]:
+                    # Sanitize file paths
+                    if "filename" in frame:
+                        frame["filename"] = self._sanitize_path(frame["filename"])
+                    if "abs_path" in frame:
+                        frame["abs_path"] = self._sanitize_path(frame["abs_path"])
+
+    @staticmethod
+    def _sanitize_path(path: str) -> str:
+        """Remove username from file paths."""
+        import re
+        # macOS/Linux: /Users/username/ or /home/username/
+        path = re.sub(r'/Users/[^/]+/', '/Users/<user>/', path)
+        path = re.sub(r'/home/[^/]+/', '/home/<user>/', path)
+        # Windows: C:\Users\username\
+        path = re.sub(r'C:\\Users\\[^\\]+\\', 'C:\\Users\\<user>\\', path)
+        return path
+
+    def capture_exception(self, exception: Optional[Exception] = None, **kwargs) -> None:
+        """Capture an exception."""
+        if not self._enabled:
+            return
+        sentry_sdk.capture_exception(exception, **kwargs)
+
+    def capture_message(self, message: str, level: str = "info", **kwargs) -> None:
+        """Capture a message."""
+        if not self._enabled:
+            return
+        sentry_sdk.capture_message(message, level=level, **kwargs)
+
+    def capture_event(self, event_name: str, properties: Optional[dict] = None) -> None:
+        """Capture a custom event (feature usage)."""
+        if not self._enabled:
+            return
+
+        properties = properties or {}
+        properties["event_name"] = event_name
+        sentry_sdk.capture_message(
+            f"event:{event_name}",
+            level="info",
+            extras=properties,
+        )
+
+    def set_user(self, user_id: str, **kwargs) -> None:
+        """Set user context (anonymous ID only)."""
+        if not self._enabled:
+            return
+        sentry_sdk.set_user({"id": user_id, **kwargs})
+
+    def set_tag(self, key: str, value: str) -> None:
+        """Set a custom tag."""
+        if not self._enabled:
+            return
+        sentry_sdk.set_tag(key, value)
+
+    def add_breadcrumb(self, message: str, category: str = "default", **kwargs) -> None:
+        """Add a breadcrumb for context."""
+        if not self._enabled:
+            return
+        sentry_sdk.add_breadcrumb(message=message, category=category, **kwargs)
+
+
+# Convenience singleton access
+def get_telemetry() -> TelemetryClient:
+    """Get the telemetry client instance."""
+    return TelemetryClient.get_instance()
+```
+
+### Decorator for Function Tracking
+
+```python
+# src/openadapt_telemetry/decorators.py
+"""Convenience decorators for telemetry."""
+
+import functools
+import time
+from typing import Callable, Optional
+
+import sentry_sdk
+
+from .client import get_telemetry
+
+
+def track_performance(name: Optional[str] = None):
+    """Decorator to track function performance."""
+    def decorator(func: Callable) -> Callable:
+        operation_name = name or func.__name__
+
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            telemetry = get_telemetry()
+
+            with sentry_sdk.start_transaction(op="function", name=operation_name):
+                start = time.perf_counter()
+                try:
+                    return func(*args, **kwargs)
+                finally:
+                    duration = time.perf_counter() - start
+                    sentry_sdk.set_measurement("duration_ms", duration * 1000)
+
+        return wrapper
+    return decorator
+
+
+def track_errors(reraise: bool = True):
+    """Decorator to automatically capture exceptions."""
+    def decorator(func: Callable) -> Callable:
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            try:
+                return func(*args, **kwargs)
+            except Exception as e:
+                get_telemetry().capture_exception(e)
+                if reraise:
+                    raise
+        return wrapper
+    return decorator
+
+
+def track_feature(feature_name: str):
+    """Decorator to track feature usage."""
+    def decorator(func: Callable) -> Callable:
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            get_telemetry().capture_event(
+                f"feature:{feature_name}",
+                {"function": func.__name__},
+            )
+            return func(*args, **kwargs)
+        return wrapper
+    return decorator
+```
+
+### Package Integration Example
+
+```python
+# src/openadapt_retrieval/__init__.py
+"""OpenAdapt Retrieval - Multimodal demo retrieval."""
+
+from openadapt_retrieval.embeddings import (
+    BaseEmbedder,
+    CLIPEmbedder,
+    Qwen3VLEmbedder,
+    get_embedder,
+)
+from openadapt_retrieval.retriever import (
+    DemoMetadata,
+    MultimodalDemoRetriever,
+    RetrievalResult,
+    VectorIndex,
+)
+from openadapt_retrieval.storage import EmbeddingStorage
+
+__version__ = "0.1.0"
+
+# Initialize telemetry on import (lazy, respects opt-out)
+try:
+    from openadapt_telemetry import get_telemetry
+    get_telemetry().initialize(
+        package_name="openadapt-retrieval",
+        package_version=__version__,
+    )
+except ImportError:
+    # Telemetry package not installed (minimal install)
+    pass
+
+__all__ = [
+    "BaseEmbedder",
+    "Qwen3VLEmbedder",
+    "CLIPEmbedder",
+    "get_embedder",
+    "MultimodalDemoRetriever",
+    "VectorIndex",
+    "RetrievalResult",
+    "DemoMetadata",
+    "EmbeddingStorage",
+]
+```
+
+### Feature Usage Tracking Example
+
+```python
+# In openadapt-retrieval/retriever/demo_retriever.py
+
+from openadapt_telemetry import get_telemetry, track_feature, track_performance
+
+
+class MultimodalDemoRetriever:
+    """Retriever for multimodal demo search."""
+
+    @track_feature("retrieval.add_demo")
+    def add_demo(
+        self,
+        demo_id: str,
+        task: str,
+        screenshot: Optional[Union[str, Path, Image.Image]] = None,
+        **metadata,
+    ) -> None:
+        """Add a demo to the retrieval library."""
+        # Implementation...
+
+    @track_performance("retrieval.build_index")
+    def build_index(self) -> None:
+        """Build the FAISS index from stored demos."""
+        try:
+            # Implementation...
+            get_telemetry().capture_event(
+                "retrieval.index_built",
+                {"num_demos": len(self._demos)},
+            )
+        except Exception as e:
+            get_telemetry().capture_exception(e)
+            raise
+
+    @track_performance("retrieval.search")
+    def retrieve(
+        self,
+        task: str,
+        screenshot: Optional[Union[str, Path, Image.Image]] = None,
+        top_k: int = 5,
+    ) -> List[RetrievalResult]:
+        """Find similar demos for a given query."""
+        # Implementation...
+```
+
+### CLI Opt-Out Information
+
+```python
+# In CLI help text
+
+TELEMETRY_HELP = """
+OpenAdapt collects anonymous usage data to improve the software.
+
+What we collect:
+  - Error reports (exception types, stack traces)
+  - Performance metrics (timing, memory usage)
+  - Feature usage counts (which features are popular)
+
+What we NEVER collect:
+  - Screenshots or images
+  - Text you type or file contents
+  - Personal information (names, emails, IPs)
+  - API keys or passwords
+
+To disable telemetry:
+  - Set OPENADAPT_TELEMETRY_ENABLED=false
+  - Or set DO_NOT_TRACK=1 (universal standard)
+
+For more info: https://docs.openadapt.ai/telemetry
+"""
+```
+
+---
+
+## Migration Plan
+
+### Phase 1: Create Telemetry Package
+
+1. Create `openadapt-telemetry` package
+2. Implement core client with GlitchTip/Sentry SDK
+3. Add privacy filtering and scrubbing
+4. Write comprehensive tests
+5. Publish to PyPI
+
+### Phase 2: Update Meta-Package
+
+1. Add `openadapt-telemetry` as optional dependency
+2. Update documentation
+3. Add CLI telemetry status command
+
+### Phase 3: Integrate with Packages
+
+For each package (`capture`, `ml`, `evals`, `viewer`, `grounding`, `retrieval`, `privacy`):
+
+1. Add `openadapt-telemetry` dependency
+2. Initialize telemetry in `__init__.py`
+3. Add tracking to key operations
+4. Test with telemetry enabled/disabled
+
+### Phase 4: Legacy Migration
+
+1. Update legacy error_reporting.py to use new module
+2. Migrate PostHog events to unified system
+3. Deprecate old telemetry code
+
+### Timeline
+
+| Phase | Duration | Milestone |
+|-------|----------|-----------|
+| Phase 1 | 1 week | Telemetry package published |
+| Phase 2 | 2 days | Meta-package updated |
+| Phase 3 | 2 weeks | All packages integrated |
+| Phase 4 | 1 week | Legacy migration complete |
+
+---
+
+## Testing Strategy
+
+### Unit Tests
+
+```python
+# tests/test_telemetry.py
+
+import os
+from unittest.mock import patch, MagicMock
+
+import pytest
+
+from openadapt_telemetry import TelemetryClient, get_telemetry
+
+
+class TestTelemetryOptOut:
+    """Test that telemetry respects opt-out settings."""
+
+    def test_do_not_track_env(self):
+        """DO_NOT_TRACK=1 should disable telemetry."""
+        with patch.dict(os.environ, {"DO_NOT_TRACK": "1"}):
+            client = TelemetryClient()
+            assert not client._enabled
+
+    def test_explicit_disable(self):
+        """OPENADAPT_TELEMETRY_ENABLED=false should disable."""
+        with patch.dict(os.environ, {"OPENADAPT_TELEMETRY_ENABLED": "false"}):
+            client = TelemetryClient()
+            assert not client._enabled
+
+    def test_internal_detection(self):
+        """Internal users should be detected."""
+        with patch.dict(os.environ, {"OPENADAPT_INTERNAL": "true"}):
+            client = TelemetryClient()
+            assert client._internal
+
+
+class TestPrivacyScrubbing:
+    """Test that PII is properly scrubbed."""
+
+    def test_path_sanitization(self):
+        """File paths should have usernames removed."""
+        client = TelemetryClient()
+
+        assert client._sanitize_path("/Users/john/code/file.py") == "/Users/<user>/code/file.py"
+        assert client._sanitize_path("/home/alice/app/main.py") == "/home/<user>/app/main.py"
+        assert client._sanitize_path("C:\\Users\\bob\\code\\file.py") == "C:\\Users\\<user>\\code\\file.py"
+```
+
+---
+
+## References
+
+### GlitchTip
+
+- [GlitchTip Documentation](https://glitchtip.com/documentation/)
+- [GlitchTip Installation Guide](https://glitchtip.com/documentation/install/)
+- [Sentry SDK Documentation (GlitchTip compatible)](https://glitchtip.com/sdkdocs/python/)
+
+### Privacy & Ethics
+
+- [GDPR Telemetry Data Guidelines](https://www.activemind.legal/guides/telemetry-data/)
+- [Linux Foundation Telemetry Policy](https://www.linuxfoundation.org/legal/telemetry-data-policy)
+- [OpenTelemetry Handling Sensitive Data](https://opentelemetry.io/docs/security/handling-sensitive-data/)
+
+### Industry Standards
+
+- [DO_NOT_TRACK Environment Variable](https://consoledonottrack.com/)
+- [Kedro Telemetry Plugin](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-telemetry)
+
+### Sentry SDK
+
+- [Sentry Python SDK](https://docs.sentry.io/platforms/python/)
+- [Sentry Filtering](https://docs.sentry.io/platforms/python/configuration/filtering/)
+- [Sentry Tags](https://docs.sentry.io/platforms/python/enriching-events/tags/)
+
+---
+
+## Appendix: Configuration Reference
+
+### All Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `DO_NOT_TRACK` | - | Universal opt-out (1 = disabled) |
+| `OPENADAPT_TELEMETRY_ENABLED` | `true` | Enable/disable telemetry |
+| `OPENADAPT_INTERNAL` | `false` | Tag as internal usage |
+| `OPENADAPT_DEV` | `false` | Development mode |
+| `OPENADAPT_TELEMETRY_DSN` | (default) | GlitchTip DSN |
+| `OPENADAPT_TELEMETRY_ENVIRONMENT` | `production` | Environment name |
+| `OPENADAPT_TELEMETRY_SAMPLE_RATE` | `1.0` | Error sampling rate |
+| `OPENADAPT_TELEMETRY_TRACES_SAMPLE_RATE` | `0.01` | Performance sampling rate |
+
+### DSN Configuration
+
+The DSN (Data Source Name) should be stored securely and not committed to version control:
+
+```bash
+# Development (use separate project)
+export OPENADAPT_TELEMETRY_DSN="https://xxx@app.glitchtip.com/dev-project"
+
+# Production (use production project)
+export OPENADAPT_TELEMETRY_DSN="https://xxx@app.glitchtip.com/prod-project"
+
+# Self-hosted
+export OPENADAPT_TELEMETRY_DSN="https://xxx@glitchtip.your-domain.com/project"
+```
diff --git a/docs/design/tray-logging.md b/docs/design/tray-logging.md
new file mode 100644
index 000000000..b8ae7d937
--- /dev/null
+++ b/docs/design/tray-logging.md
@@ -0,0 +1,801 @@
+# OpenAdapt Tray: Logging & Action Storage
+
+This document supplements the main `openadapt-tray` design document with detailed specifications for logging, action history, telemetry integration, and storage considerations.
+
+## Table of Contents
+
+1. [Local Logging](#local-logging)
+2. [Action History](#action-history)
+3. [Telemetry Integration](#telemetry-integration)
+4. [Privacy Considerations](#privacy-considerations)
+5. [Storage Locations](#storage-locations)
+6. [Integration with Existing Packages](#integration-with-existing-packages)
+
+---
+
+## Local Logging
+
+### Platform-Specific Log Paths
+
+The tray application stores logs in platform-appropriate locations following OS conventions:
+
+| Platform | Log Directory |
+|----------|---------------|
+| macOS | `~/Library/Application Support/OpenAdapt/logs/` |
+| Windows | `%APPDATA%/OpenAdapt/logs/` |
+| Linux | `~/.local/share/openadapt/logs/` |
+
+### Log File Naming
+
+```
+openadapt-tray.log          # Current log file
+openadapt-tray.log.1        # Previous rotation (newest)
+openadapt-tray.log.2        # Older rotation
+...
+openadapt-tray.log.5        # Oldest rotation
+```
+
+### Log Rotation Policy
+
+| Setting | Value | Rationale |
+|---------|-------|-----------|
+| **Max File Size** | 10 MB | Prevents disk space issues |
+| **Max Backup Count** | 5 files | ~50 MB total log storage |
+| **Rotation Trigger** | Size-based | Predictable disk usage |
+| **Compression** | gzip for backups | Reduces storage footprint |
+
+### Log Retention Policy
+
+- **Active logs**: Rotated based on size (10 MB threshold)
+- **Rotated logs**: Kept for 30 days or 5 rotations, whichever comes first
+- **Crash logs**: Retained for 90 days for debugging
+- **Automatic cleanup**: Old logs purged on app startup
+
+### Log Levels
+
+| Environment | Level | Description |
+|-------------|-------|-------------|
+| **Production** | `INFO` | Normal operations, errors, warnings |
+| **Debug** | `DEBUG` | Verbose output including state changes |
+| **Trace** | `TRACE` | Extremely verbose, including IPC messages |
+
+### Log Level Configuration
+
+```python
+# Environment variable override
+OPENADAPT_TRAY_LOG_LEVEL=DEBUG
+
+# Or via config.json
+{
+    "logging": {
+        "level": "INFO",
+        "console": false,
+        "file": true
+    }
+}
+```
+
+### Log Format
+
+```
+2024-01-15 10:30:45.123 | INFO     | tray.main:start_recording:42 - Recording session started
+2024-01-15 10:30:45.456 | DEBUG    | tray.menu:update_state:78 - Menu state updated: recording=True
+2024-01-15 10:31:12.789 | ERROR    | tray.capture:on_error:156 - Capture failed: Permission denied
+```
+
+Format specification:
+```
+{timestamp} | {level:8} | {module}:{function}:{line} - {message}
+```
+
+### Implementation Example
+
+```python
+import logging
+from logging.handlers import RotatingFileHandler
+from pathlib import Path
+import platform
+import sys
+
+def get_log_directory() -> Path:
+    """Get platform-appropriate log directory."""
+    if platform.system() == "Darwin":
+        base = Path.home() / "Library" / "Application Support"
+    elif platform.system() == "Windows":
+        base = Path(os.environ.get("APPDATA", Path.home() / "AppData" / "Roaming"))
+    else:  # Linux and others
+        base = Path(os.environ.get("XDG_DATA_HOME", Path.home() / ".local" / "share"))
+
+    log_dir = base / "OpenAdapt" / "logs"
+    log_dir.mkdir(parents=True, exist_ok=True)
+    return log_dir
+
+def setup_logging(level: str = "INFO") -> logging.Logger:
+    """Configure logging for the tray application."""
+    logger = logging.getLogger("openadapt.tray")
+    logger.setLevel(getattr(logging, level.upper()))
+
+    # File handler with rotation
+    log_file = get_log_directory() / "openadapt-tray.log"
+    file_handler = RotatingFileHandler(
+        log_file,
+        maxBytes=10 * 1024 * 1024,  # 10 MB
+        backupCount=5,
+        encoding="utf-8",
+    )
+    file_handler.setFormatter(logging.Formatter(
+        "{asctime} | {levelname:8} | {name}:{funcName}:{lineno} - {message}",
+        style="{",
+        datefmt="%Y-%m-%d %H:%M:%S",
+    ))
+    logger.addHandler(file_handler)
+
+    return logger
+```
+
+---
+
+## Action History
+
+### Overview
+
+The tray app maintains a local history of user interactions for:
+- Auditing user actions
+- Supporting undo/redo functionality
+- Debugging session issues
+- Syncing state with other OpenAdapt components
+
+### Tracked Actions
+
+| Action Type | Data Captured | Purpose |
+|-------------|---------------|---------|
+| `recording.start` | timestamp, task_name, settings | Session tracking |
+| `recording.stop` | timestamp, duration, frame_count | Session completion |
+| `recording.pause` | timestamp | Session state |
+| `recording.resume` | timestamp | Session state |
+| `training.start` | timestamp, model_type, demo_ids | Training tracking |
+| `training.complete` | timestamp, duration, success | Training outcomes |
+| `training.cancel` | timestamp, reason | Training interruptions |
+| `settings.changed` | key, old_value, new_value | Configuration audit |
+| `app.start` | timestamp, version, os_info | Lifecycle tracking |
+| `app.stop` | timestamp, exit_reason | Lifecycle tracking |
+| `error.occurred` | timestamp, error_type, context | Error tracking |
+
+### Storage Format
+
+Action history is stored in a local SQLite database for efficient querying and reliable storage.
+
+#### Database Schema
+
+```sql
+-- Action history table
+CREATE TABLE action_history (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    timestamp TEXT NOT NULL,              -- ISO 8601 format
+    action_type TEXT NOT NULL,            -- e.g., 'recording.start'
+    session_id TEXT,                      -- Groups related actions
+    data TEXT,                            -- JSON blob for action-specific data
+    synced INTEGER DEFAULT 0,             -- Sync status with capture DB
+    created_at TEXT DEFAULT CURRENT_TIMESTAMP
+);
+
+-- Index for common queries
+CREATE INDEX idx_action_timestamp ON action_history(timestamp);
+CREATE INDEX idx_action_type ON action_history(action_type);
+CREATE INDEX idx_session_id ON action_history(session_id);
+CREATE INDEX idx_synced ON action_history(synced);
+
+-- Session metadata table
+CREATE TABLE sessions (
+    id TEXT PRIMARY KEY,                  -- UUID
+    task_name TEXT,
+    started_at TEXT NOT NULL,
+    ended_at TEXT,
+    status TEXT DEFAULT 'active',         -- active, completed, cancelled, error
+    frame_count INTEGER DEFAULT 0,
+    duration_seconds REAL,
+    capture_db_id TEXT                    -- Reference to openadapt-capture DB
+);
+```
+
+#### Example Records
+
+```json
+{
+    "id": 1,
+    "timestamp": "2024-01-15T10:30:45.123Z",
+    "action_type": "recording.start",
+    "session_id": "550e8400-e29b-41d4-a716-446655440000",
+    "data": {
+        "task_name": "Fill out expense report",
+        "settings": {
+            "capture_screenshots": true,
+            "capture_audio": false,
+            "fps": 1
+        }
+    },
+    "synced": 0
+}
+```
+
+### Sync with openadapt-capture Database
+
+The tray app synchronizes action history with the capture package's database to maintain a unified record:
+
+```python
+from pathlib import Path
+import sqlite3
+from typing import Optional
+import json
+
+class ActionHistorySync:
+    """Sync tray action history with capture database."""
+
+    def __init__(self, tray_db_path: Path, capture_db_path: Optional[Path] = None):
+        self.tray_db = tray_db_path
+        self.capture_db = capture_db_path
+
+    def sync_session(self, session_id: str) -> bool:
+        """Sync a completed session to capture database."""
+        if not self.capture_db or not self.capture_db.exists():
+            return False
+
+        with sqlite3.connect(self.tray_db) as tray_conn:
+            # Get unsynced actions for this session
+            actions = tray_conn.execute(
+                """
+                SELECT id, timestamp, action_type, data
+                FROM action_history
+                WHERE session_id = ? AND synced = 0
+                ORDER BY timestamp
+                """,
+                (session_id,)
+            ).fetchall()
+
+        if not actions:
+            return True
+
+        with sqlite3.connect(self.capture_db) as capture_conn:
+            # Insert into capture database's session_events table
+            for action_id, timestamp, action_type, data in actions:
+                capture_conn.execute(
+                    """
+                    INSERT INTO session_events (timestamp, event_type, data, source)
+                    VALUES (?, ?, ?, 'tray')
+                    """,
+                    (timestamp, action_type, data)
+                )
+            capture_conn.commit()
+
+        # Mark as synced
+        with sqlite3.connect(self.tray_db) as tray_conn:
+            tray_conn.executemany(
+                "UPDATE action_history SET synced = 1 WHERE id = ?",
+                [(a[0],) for a in actions]
+            )
+            tray_conn.commit()
+
+        return True
+```
+
+### Retention Policy
+
+| Data Type | Retention Period | Rationale |
+|-----------|------------------|-----------|
+| Action history | 90 days | Debugging and audit trail |
+| Session metadata | 1 year | Long-term usage patterns |
+| Synced records | 30 days (then delete) | Reduce redundancy |
+
+---
+
+## Telemetry Integration
+
+### Reference Design
+
+For detailed telemetry implementation, see the comprehensive telemetry design at [docs/design/telemetry-design.md](./telemetry-design.md).
+
+### GlitchTip/Sentry Integration
+
+The tray app uses the shared `openadapt-telemetry` module for crash reporting and error tracking.
+
+```python
+# Initialize telemetry in tray app
+from openadapt_telemetry import get_telemetry
+
+def init_app():
+    """Initialize the tray application."""
+    telemetry = get_telemetry()
+    telemetry.initialize(
+        package_name="openadapt-tray",
+        package_version=__version__,
+    )
+```
+
+### Error and Crash Reporting
+
+```python
+from openadapt_telemetry import get_telemetry, track_errors
+
+class TrayApp:
+    @track_errors(reraise=True)
+    def start_recording(self, task_name: str) -> None:
+        """Start a recording session."""
+        try:
+            # Recording logic...
+            pass
+        except PermissionError as e:
+            get_telemetry().capture_exception(e, tags={
+                "action": "start_recording",
+                "platform": platform.system(),
+            })
+            raise
+```
+
+### Anonymous Usage Analytics (Opt-In)
+
+Usage analytics are strictly opt-in and collect only aggregate, non-identifying data.
+
+#### Events Tracked
+
+| Event | Data Collected | Purpose |
+|-------|----------------|---------|
+| `tray.app_start` | timestamp, version, os, internal_flag | App lifecycle |
+| `tray.app_stop` | timestamp, uptime_seconds, exit_reason | App lifecycle |
+| `tray.recording_session` | duration_seconds, success, frame_count | Usage patterns |
+| `tray.training_initiated` | model_type, demo_count | Feature usage |
+| `tray.error` | error_type (no message), context | Error patterns |
+
+#### Event Implementation
+
+```python
+from openadapt_telemetry import get_telemetry
+
+def track_recording_session(duration: float, success: bool, frame_count: int):
+    """Track recording session metrics (opt-in only)."""
+    telemetry = get_telemetry()
+
+    if not telemetry.is_analytics_enabled():
+        return
+
+    telemetry.capture_event(
+        "tray.recording_session",
+        {
+            "duration_seconds": round(duration, 1),
+            "success": success,
+            "frame_count_bucket": bucket_count(frame_count),  # 0-10, 10-50, 50-100, 100+
+        }
+    )
+
+def bucket_count(count: int) -> str:
+    """Bucket counts to avoid exact numbers (privacy)."""
+    if count <= 10:
+        return "0-10"
+    elif count <= 50:
+        return "10-50"
+    elif count <= 100:
+        return "50-100"
+    else:
+        return "100+"
+```
+
+---
+
+## Privacy Considerations
+
+### Core Principles
+
+1. **Local-First**: All data stored locally by default
+2. **No PII**: Never collect personally identifiable information
+3. **No Content**: Never collect screenshots, recordings, or user input
+4. **Explicit Consent**: Cloud sync and analytics require opt-in
+5. **Transparency**: Users can inspect all stored data
+
+### What Is Never Collected or Transmitted
+
+| Data Type | Reason |
+|-----------|--------|
+| Screenshots | Highly sensitive, potential PII |
+| Recorded actions | Contains user behavior data |
+| Typed text | PII and sensitive content |
+| File paths with usernames | PII leakage |
+| IP addresses | Location identification |
+| Hardware identifiers | Device fingerprinting |
+| Window titles | May contain sensitive info |
+
+### Opt-In/Opt-Out Settings
+
+```json
+// config.json
+{
+    "telemetry": {
+        "crash_reporting": true,       // Enabled by default, can disable
+        "anonymous_analytics": false,  // Disabled by default, opt-in
+        "cloud_sync": false            // Disabled by default, opt-in
+    }
+}
+```
+
+### Settings UI Integration
+
+The tray app settings menu should include clear telemetry controls:
+
+```
+Settings > Privacy
+├── [x] Send crash reports (helps improve stability)
+├── [ ] Share anonymous usage statistics
+├── [ ] Sync settings across devices
+└── [View collected data...] -> Opens local data directory
+```
+
+### Data Inspection
+
+Users can inspect all locally stored data:
+
+```python
+def open_data_directory():
+    """Open the OpenAdapt data directory in file explorer."""
+    import subprocess
+    import platform
+
+    data_dir = get_data_directory()
+
+    if platform.system() == "Darwin":
+        subprocess.run(["open", str(data_dir)])
+    elif platform.system() == "Windows":
+        subprocess.run(["explorer", str(data_dir)])
+    else:
+        subprocess.run(["xdg-open", str(data_dir)])
+```
+
+### Data Deletion
+
+Users can delete all local data:
+
+```python
+def clear_all_data(keep_config: bool = True):
+    """Delete all OpenAdapt local data."""
+    data_dir = get_data_directory()
+
+    for item in data_dir.iterdir():
+        if keep_config and item.name == "config.json":
+            continue
+        if item.is_dir():
+            shutil.rmtree(item)
+        else:
+            item.unlink()
+
+    logger.info("All local data cleared")
+```
+
+---
+
+## Storage Locations
+
+### Directory Structure
+
+```
+macOS:   ~/Library/Application Support/OpenAdapt/
+Windows: %APPDATA%/OpenAdapt/
+Linux:   ~/.local/share/openadapt/
+
+Contents:
+├── logs/                    # Application logs
+│   ├── openadapt-tray.log   # Current tray app log
+│   ├── openadapt-tray.log.1 # Rotated logs
+│   └── crash/               # Crash dumps
+├── config.json              # User settings and preferences
+├── history.db               # Action history (SQLite)
+├── cache/                   # Temporary files
+│   ├── icons/               # Cached tray icons
+│   └── temp/                # Temporary processing files
+└── state/                   # Persistent state
+    └── session.json         # Current session state (for crash recovery)
+```
+
+### Storage Path Resolution
+
+```python
+import os
+import platform
+from pathlib import Path
+from typing import Dict
+
+def get_storage_paths() -> Dict[str, Path]:
+    """Get all storage paths for the current platform."""
+
+    if platform.system() == "Darwin":
+        base = Path.home() / "Library" / "Application Support" / "OpenAdapt"
+    elif platform.system() == "Windows":
+        appdata = os.environ.get("APPDATA", Path.home() / "AppData" / "Roaming")
+        base = Path(appdata) / "OpenAdapt"
+    else:  # Linux and others
+        xdg_data = os.environ.get("XDG_DATA_HOME", Path.home() / ".local" / "share")
+        base = Path(xdg_data) / "openadapt"
+
+    paths = {
+        "base": base,
+        "logs": base / "logs",
+        "crash_logs": base / "logs" / "crash",
+        "config": base / "config.json",
+        "history_db": base / "history.db",
+        "cache": base / "cache",
+        "state": base / "state",
+    }
+
+    # Ensure directories exist
+    for key, path in paths.items():
+        if key not in ("config", "history_db"):  # Don't create files
+            path.mkdir(parents=True, exist_ok=True)
+
+    return paths
+```
+
+### Config File Schema
+
+```json
+{
+    "$schema": "https://openadapt.ai/schemas/tray-config-v1.json",
+    "version": 1,
+    "logging": {
+        "level": "INFO",
+        "console": false,
+        "file": true,
+        "max_size_mb": 10,
+        "backup_count": 5
+    },
+    "telemetry": {
+        "crash_reporting": true,
+        "anonymous_analytics": false,
+        "cloud_sync": false
+    },
+    "recording": {
+        "default_fps": 1,
+        "capture_audio": false,
+        "capture_screenshots": true,
+        "auto_pause_on_idle": true,
+        "idle_threshold_seconds": 30
+    },
+    "ui": {
+        "show_notifications": true,
+        "start_minimized": false,
+        "start_on_login": false
+    },
+    "advanced": {
+        "capture_db_path": null,
+        "ml_model_path": null
+    }
+}
+```
+
+---
+
+## Integration with Existing Packages
+
+### Shared Telemetry Module
+
+The tray app uses the shared `openadapt-telemetry` module (see [telemetry-design.md](./telemetry-design.md)) for consistent telemetry across all OpenAdapt packages.
+
+```python
+# pyproject.toml
+[project]
+dependencies = [
+    "openadapt-telemetry>=0.1.0",
+]
+```
+
+### Coordination with openadapt-capture
+
+The tray app coordinates with `openadapt-capture` for recording functionality:
+
+```python
+from openadapt_capture import RecordingSession, CaptureConfig
+from openadapt_tray.history import ActionHistory
+
+class TrayRecordingController:
+    """Bridge between tray UI and capture backend."""
+
+    def __init__(self):
+        self.history = ActionHistory()
+        self.current_session: Optional[RecordingSession] = None
+
+    def start_recording(self, task_name: str, config: CaptureConfig) -> str:
+        """Start a new recording session."""
+        import uuid
+
+        session_id = str(uuid.uuid4())
+
+        # Log to action history
+        self.history.log_action(
+            action_type="recording.start",
+            session_id=session_id,
+            data={"task_name": task_name, "config": config.to_dict()}
+        )
+
+        # Start capture backend
+        self.current_session = RecordingSession(
+            session_id=session_id,
+            task_name=task_name,
+            config=config,
+            on_error=self._on_capture_error,
+        )
+        self.current_session.start()
+
+        return session_id
+
+    def stop_recording(self) -> dict:
+        """Stop the current recording session."""
+        if not self.current_session:
+            return {"error": "No active session"}
+
+        result = self.current_session.stop()
+
+        # Log completion
+        self.history.log_action(
+            action_type="recording.stop",
+            session_id=self.current_session.session_id,
+            data={
+                "duration": result.duration,
+                "frame_count": result.frame_count,
+                "success": result.success,
+            }
+        )
+
+        # Sync with capture database
+        self.history.sync_session(self.current_session.session_id)
+
+        self.current_session = None
+        return result.to_dict()
+
+    def _on_capture_error(self, error: Exception):
+        """Handle capture errors."""
+        get_telemetry().capture_exception(error)
+        self.history.log_action(
+            action_type="error.occurred",
+            session_id=self.current_session.session_id if self.current_session else None,
+            data={"error_type": type(error).__name__}
+        )
+```
+
+### Surfacing Training Logs from openadapt-ml
+
+The tray app can display training progress and logs from the ML package:
+
+```python
+from openadapt_ml import TrainingJob, TrainingStatus
+from openadapt_tray.notifications import show_notification
+
+class TrayTrainingController:
+    """Bridge between tray UI and ML training backend."""
+
+    def __init__(self):
+        self.history = ActionHistory()
+        self.current_job: Optional[TrainingJob] = None
+
+    def start_training(self, model_type: str, demo_ids: list[str]) -> str:
+        """Start a training job."""
+        job_id = str(uuid.uuid4())
+
+        self.history.log_action(
+            action_type="training.start",
+            data={
+                "job_id": job_id,
+                "model_type": model_type,
+                "demo_count": len(demo_ids),
+            }
+        )
+
+        self.current_job = TrainingJob(
+            job_id=job_id,
+            model_type=model_type,
+            demo_ids=demo_ids,
+            on_progress=self._on_training_progress,
+            on_complete=self._on_training_complete,
+            on_error=self._on_training_error,
+        )
+        self.current_job.start()
+
+        return job_id
+
+    def _on_training_progress(self, progress: float, message: str):
+        """Handle training progress updates."""
+        # Update tray icon or menu with progress
+        pass
+
+    def _on_training_complete(self, result: TrainingStatus):
+        """Handle training completion."""
+        self.history.log_action(
+            action_type="training.complete",
+            data={
+                "job_id": self.current_job.job_id,
+                "duration": result.duration,
+                "success": result.success,
+            }
+        )
+
+        show_notification(
+            title="Training Complete",
+            message=f"Model trained successfully in {result.duration:.1f}s"
+        )
+
+        # Track telemetry (anonymous)
+        get_telemetry().capture_event(
+            "tray.training_complete",
+            {"model_type": self.current_job.model_type, "success": True}
+        )
+
+    def _on_training_error(self, error: Exception):
+        """Handle training errors."""
+        get_telemetry().capture_exception(error)
+
+        self.history.log_action(
+            action_type="training.error",
+            data={
+                "job_id": self.current_job.job_id if self.current_job else None,
+                "error_type": type(error).__name__,
+            }
+        )
+
+        show_notification(
+            title="Training Failed",
+            message="An error occurred during training. Check logs for details."
+        )
+```
+
+### Log Aggregation View
+
+The tray app can provide a unified view of logs from all OpenAdapt components:
+
+```python
+from pathlib import Path
+from typing import Iterator, NamedTuple
+from datetime import datetime
+
+class LogEntry(NamedTuple):
+    timestamp: datetime
+    level: str
+    source: str  # tray, capture, ml, etc.
+    message: str
+
+def aggregate_logs(max_entries: int = 1000) -> Iterator[LogEntry]:
+    """Aggregate logs from all OpenAdapt components."""
+
+    log_sources = {
+        "tray": get_storage_paths()["logs"] / "openadapt-tray.log",
+        "capture": get_capture_log_path(),  # From openadapt-capture
+        "ml": get_ml_log_path(),            # From openadapt-ml
+    }
+
+    entries = []
+
+    for source, log_path in log_sources.items():
+        if not log_path.exists():
+            continue
+
+        with open(log_path, "r") as f:
+            for line in f:
+                try:
+                    entry = parse_log_line(line, source)
+                    if entry:
+                        entries.append(entry)
+                except Exception:
+                    continue
+
+    # Sort by timestamp and return most recent
+    entries.sort(key=lambda e: e.timestamp, reverse=True)
+    return iter(entries[:max_entries])
+```
+
+---
+
+## Summary
+
+This document defines the logging and storage architecture for the OpenAdapt tray application:
+
+1. **Local Logging**: Platform-specific paths with rotation and retention policies
+2. **Action History**: SQLite-based storage for user interactions, synced with capture database
+3. **Telemetry**: Integration with shared telemetry module for crash reporting and opt-in analytics
+4. **Privacy**: Local-first approach with no PII collection and clear opt-in/opt-out controls
+5. **Storage**: Organized directory structure following OS conventions
+6. **Integration**: Seamless coordination with capture, ML, and telemetry packages
+
+For telemetry implementation details, refer to the comprehensive [telemetry design document](./telemetry-design.md).
diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md
index 3ac7ccc46..29e139fdf 100644
--- a/docs/getting-started/quickstart.md
+++ b/docs/getting-started/quickstart.md
@@ -1,15 +1,15 @@
 # Quick Start
 
-This guide walks you through recording a demonstration, training a model, and evaluating it.
+This guide walks you through collecting a demonstration, learning a policy, and evaluating the agent.
 
 ## Prerequisites
 
 - OpenAdapt installed with required packages: `pip install openadapt[all]`
 - macOS users: [Grant required permissions](permissions.md)
 
-## 1. Record a Demonstration
+## 1. Collect a Demonstration
 
-Start recording your screen and inputs:
+Start capturing your screen and inputs:
 
 ```bash
 openadapt capture start --name my-task
@@ -22,14 +22,14 @@ Now perform the task you want to automate:
 3. Navigate menus
 4. Complete your workflow
 
-When finished, stop recording:
+When finished, stop the capture:
 
 ```bash
 # Press Ctrl+C in the terminal, or:
 openadapt capture stop
 ```
 
-## 2. View the Recording
+## 2. View the Trajectory
 
 Inspect what was captured:
 
@@ -37,15 +37,15 @@ Inspect what was captured:
 openadapt capture view my-task
 ```
 
-This opens an HTML viewer showing:
+This opens a trajectory viewer showing:
 
-- Screenshots at each step
-- Mouse and keyboard events
+- Observations (screenshots) at each step
+- Actions (mouse and keyboard events)
 - Timing information
 
-## 3. List Your Captures
+## 3. List Your Demonstrations
 
-See all recorded demonstrations:
+See all collected demonstrations:
 
 ```bash
 openadapt capture list
@@ -59,25 +59,25 @@ my-task      45       2m 30s     2026-01-16
 login-demo   23       1m 15s     2026-01-15
 ```
 
-## 4. Train a Model
+## 4. Learn a Policy
 
-Train a model on your recorded demonstration:
+Learn an agent policy from your demonstration trajectory:
 
 ```bash
 openadapt train start --capture my-task --model qwen3vl-2b
 ```
 
-Monitor training progress:
+Monitor policy learning progress:
 
 ```bash
 openadapt train status
 ```
 
-Training creates a checkpoint file in `training_output/`.
+Policy learning creates a checkpoint file in `training_output/`.
 
-## 5. Evaluate the Model
+## 5. Evaluate the Agent
 
-Test your trained model on a benchmark:
+Test your trained policy on a benchmark:
 
 ```bash
 openadapt eval run --checkpoint training_output/model.pt --benchmark waa
@@ -103,7 +103,7 @@ openadapt eval run --agent api-claude --benchmark waa
 
 ## Complete Workflow Example
 
-Here is a complete example from start to finish:
+Here is a complete example demonstrating the full pipeline:
 
 ```bash
 # 1. Install OpenAdapt
@@ -112,21 +112,21 @@ pip install openadapt[all]
 # 2. Check system requirements
 openadapt doctor
 
-# 3. Record a task
+# 3. Collect a demonstration
 openadapt capture start --name email-reply
 # ... perform the task ...
 # Press Ctrl+C to stop
 
-# 4. View the recording
+# 4. View the trajectory
 openadapt capture view email-reply
 
-# 5. Train a model
+# 5. Learn a policy
 openadapt train start --capture email-reply --model qwen3vl-2b
 
-# 6. Wait for training to complete
+# 6. Wait for policy learning to complete
 openadapt train status
 
-# 7. Evaluate
+# 7. Evaluate the agent
 openadapt eval run --checkpoint training_output/model.pt --benchmark waa
 ```
 
diff --git a/docs/index.md b/docs/index.md
index 4aeb4c448..99f8bc665 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -4,7 +4,7 @@
 
 OpenAdapt is the **open** source software **adapt**er between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.
 
-Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.
+Collect human demonstrations, learn agent policies, and evaluate autonomous execution - all from a unified CLI.
 
 [Join Discord](https://discord.gg/yF527cQbDG){ .md-button .md-button--primary }
 [View on GitHub](https://github.com/OpenAdaptAI/OpenAdapt){ .md-button }
@@ -15,24 +15,24 @@ Record GUI demonstrations, train ML models, and evaluate agents - all from a uni
 
 OpenAdapt bridges the gap between powerful AI models and everyday software automation. Instead of writing complex scripts or learning APIs, you simply:
 
-1. **Record** - Demonstrate a task by doing it yourself
-2. **Train** - Let OpenAdapt learn from your demonstration
-3. **Deploy** - Run your trained agent to automate the task
-4. **Evaluate** - Measure performance on standardized benchmarks
+1. **Demonstrate** - Show the agent how to perform a task by doing it yourself
+2. **Learn** - Let OpenAdapt learn an agent policy from your demonstration trajectory
+3. **Execute** - Deploy your trained agent to autonomously perform the task
+4. **Evaluate** - Measure agent performance on standardized benchmarks
 
 ```mermaid
 flowchart LR
-    subgraph Record["1. Record"]
-        A[User Demo] --> B[Capture]
+    subgraph Demonstrate["1. Demonstrate"]
+        A[Human Trajectory] --> B[Capture]
     end
 
-    subgraph Train["2. Train"]
-        B --> C[ML Model]
+    subgraph Learn["2. Learn"]
+        B --> C[Policy Learning]
     end
 
-    subgraph Deploy["3. Deploy"]
-        C --> D[Agent Policy]
-        D --> E[Action Replay]
+    subgraph Execute["3. Execute"]
+        C --> D[Trained Policy]
+        D --> E[Agent Deployment]
     end
 
     subgraph Evaluate["4. Evaluate"]
@@ -53,7 +53,7 @@ flowchart LR
 Works with any Large Multimodal Model - Claude, GPT-4V, Gemini, Qwen-VL, or your own fine-tuned models.
 
 ### Learn from Demonstration
-No prompting required. OpenAdapt learns directly from how you perform tasks, automatically generating the right prompts.
+No manual prompt engineering required. OpenAdapt learns agent policies directly from your demonstration trajectories.
 
 ### Universal GUI Support
 Works with all desktop GUIs including native applications, web browsers, and virtualized environments.
@@ -71,14 +71,14 @@ Install OpenAdapt with the features you need:
 pip install openadapt[all]  # Everything
 ```
 
-Record a demonstration:
+Collect a demonstration:
 
 ```bash
 openadapt capture start --name my-task
 # Perform your task, then press Ctrl+C
 ```
 
-Train a model:
+Learn a policy:
 
 ```bash
 openadapt train start --capture my-task --model qwen3vl-2b
@@ -100,12 +100,12 @@ OpenAdapt v1.0+ uses a **modular meta-package architecture**. The main `openadap
 
 | Package | Description |
 |---------|-------------|
-| [openadapt-capture](packages/capture.md) | Event recording and storage |
-| [openadapt-ml](packages/ml.md) | ML engine, training, inference |
+| [openadapt-capture](packages/capture.md) | Demonstration collection and storage |
+| [openadapt-ml](packages/ml.md) | Policy learning, training, inference |
 | [openadapt-evals](packages/evals.md) | Benchmark evaluation |
-| [openadapt-viewer](packages/viewer.md) | HTML visualization |
-| [openadapt-grounding](packages/grounding.md) | UI element localization |
-| [openadapt-retrieval](packages/retrieval.md) | Multimodal demo retrieval |
+| [openadapt-viewer](packages/viewer.md) | Trajectory visualization |
+| [openadapt-grounding](packages/grounding.md) | UI element grounding |
+| [openadapt-retrieval](packages/retrieval.md) | Trajectory retrieval |
 | [openadapt-privacy](packages/privacy.md) | PII/PHI scrubbing |
 
 See the full [Architecture Documentation](architecture.md) for detailed diagrams.
diff --git a/docs/packages/capture.md b/docs/packages/capture.md
index 67499f3fa..b27b6d846 100644
--- a/docs/packages/capture.md
+++ b/docs/packages/capture.md
@@ -1,6 +1,6 @@
 # openadapt-capture
 
-GUI recording, event capture, and storage.
+Demonstration collection, observation-action capture, and storage.
 
 **Repository**: [OpenAdaptAI/openadapt-capture](https://github.com/OpenAdaptAI/openadapt-capture)
 
@@ -14,17 +14,17 @@ pip install openadapt-capture
 
 ## Overview
 
-The capture package records user interactions with desktop and web GUIs, including:
+The capture package collects human demonstrations from desktop and web GUIs, including:
 
-- Screenshots at configurable intervals
-- Mouse events (clicks, movement, scrolling)
-- Keyboard events (key presses, text input)
+- Observations (screenshots) at configurable intervals
+- Actions: mouse events (clicks, movement, scrolling)
+- Actions: keyboard events (key presses, text input)
 - Window and application context
-- Timing information
+- Timing information for trajectory reconstruction
 
 ## CLI Commands
 
-### Start Recording
+### Start Demonstration Collection
 
 ```bash
 openadapt capture start --name my-task
@@ -37,27 +37,27 @@ Options:
 - `--no-screenshots` - Disable screenshot capture
 - `--no-keyboard` - Disable keyboard capture
 
-### Stop Recording
+### Stop Demonstration Collection
 
 ```bash
 openadapt capture stop
 ```
 
-Or press `Ctrl+C` in the recording terminal.
+Or press `Ctrl+C` in the capture terminal.
 
-### List Captures
+### List Demonstrations
 
 ```bash
 openadapt capture list
 ```
 
-### View a Capture
+### View a Demonstration Trajectory
 
 ```bash
 openadapt capture view my-task
 ```
 
-### Delete a Capture
+### Delete a Demonstration
 
 ```bash
 openadapt capture delete my-task
@@ -75,41 +75,41 @@ session = CaptureSession(name="my-task")
 recorder = Recorder(session)
 recorder.start()
 
-# ... user performs actions ...
+# ... user demonstrates the task ...
 
 # Stop recording
 recorder.stop()
 
-# Access captured data
-events = session.get_events()
-screenshots = session.get_screenshots()
+# Access captured trajectory data
+actions = session.get_actions()
+observations = session.get_observations()  # screenshots
 ```
 
 ## Data Format
 
-Captures are stored as JSON/Parquet files:
+Demonstrations are stored as JSON/Parquet files:
 
 ```
-captures/
+demonstrations/
   my-task/
     metadata.json       # Session metadata
-    events.parquet      # Event data
-    screenshots/        # Screenshot images
+    actions.parquet     # Action data (observation-action pairs)
+    observations/       # Screenshot images (observations)
       0001.png
       0002.png
       ...
 ```
 
-### Event Schema
+### Action Schema
 
 ```python
 {
-    "timestamp": float,      # Unix timestamp
-    "type": str,            # "mouse_click", "key_press", etc.
+    "timestamp": float,        # Unix timestamp
+    "action_type": str,        # "click", "type", "scroll", etc.
     "data": {
-        # Event-specific data
+        # Action-specific data
     },
-    "screenshot_id": int    # Reference to screenshot
+    "observation_id": int      # Reference to observation (screenshot)
 }
 ```
 
@@ -117,11 +117,11 @@ captures/
 
 | Export | Description |
 |--------|-------------|
-| `CaptureSession` | Manages a capture session |
-| `Recorder` | Records user interactions |
+| `CaptureSession` | Manages a demonstration collection session |
+| `Recorder` | Captures observation-action pairs |
 | `Action` | Represents a user action |
-| `MouseEvent` | Mouse event data |
-| `KeyboardEvent` | Keyboard event data |
+| `Observation` | Represents an observation (screenshot) |
+| `Trajectory` | Sequence of observation-action pairs |
 
 ## Platform Support
 
@@ -133,6 +133,6 @@ captures/
 
 ## Related Packages
 
-- [openadapt-privacy](privacy.md) - Scrub PII/PHI from captures
-- [openadapt-viewer](viewer.md) - Visualize capture data
-- [openadapt-ml](ml.md) - Train models on captures
+- [openadapt-privacy](privacy.md) - Scrub PII/PHI from demonstrations
+- [openadapt-viewer](viewer.md) - Visualize trajectories
+- [openadapt-ml](ml.md) - Learn policies from demonstrations
diff --git a/docs/packages/evals.md b/docs/packages/evals.md
index 84f5fa4a7..d861f93a6 100644
--- a/docs/packages/evals.md
+++ b/docs/packages/evals.md
@@ -26,7 +26,7 @@ The evals package provides:
 ### Run Evaluation
 
 ```bash
-# Evaluate a trained model
+# Evaluate a trained policy
 openadapt eval run --checkpoint training_output/model.pt --benchmark waa
 
 # Evaluate an API agent
@@ -35,7 +35,7 @@ openadapt eval run --agent api-claude --benchmark waa
 
 Options:
 
-- `--checkpoint` - Path to model checkpoint
+- `--checkpoint` - Path to trained policy checkpoint
 - `--agent` - Agent type (api-claude, api-gpt4v, custom)
 - `--benchmark` - Benchmark name (waa, osworld, etc.)
 - `--tasks` - Number of tasks to evaluate (default: all)
@@ -88,7 +88,7 @@ from openadapt_evals import ApiAgent, BenchmarkAdapter, evaluate_agent_on_benchm
 # Create an API agent
 agent = ApiAgent.claude()
 
-# Or load a trained model
+# Or load a trained policy
 from openadapt_ml import AgentPolicy
 agent = AgentPolicy.from_checkpoint("model.pt")
 
@@ -157,7 +157,7 @@ flowchart TB
 | `ApiAgent` | API-based agent (Claude, GPT-4V) |
 | `BenchmarkAdapter` | Benchmark interface |
 | `MockAdapter` | Mock benchmark for testing |
-| `evaluate_agent_on_benchmark` | Evaluation function |
+| `evaluate_agent_on_benchmark` | Agent evaluation function |
 | `EvalResults` | Evaluation results container |
 
 ## Metrics
@@ -171,5 +171,5 @@ flowchart TB
 
 ## Related Packages
 
-- [openadapt-ml](ml.md) - Train models to evaluate
-- [openadapt-capture](capture.md) - Record training data
+- [openadapt-ml](ml.md) - Learn policies to evaluate
+- [openadapt-capture](capture.md) - Collect demonstrations
diff --git a/docs/packages/grounding.md b/docs/packages/grounding.md
index 7ef939cc6..0b52b019f 100644
--- a/docs/packages/grounding.md
+++ b/docs/packages/grounding.md
@@ -1,6 +1,6 @@
 # openadapt-grounding
 
-UI element localization for improved action accuracy.
+UI element grounding for improved action accuracy.
 
 **Repository**: [OpenAdaptAI/openadapt-grounding](https://github.com/OpenAdaptAI/openadapt-grounding)
 
@@ -14,7 +14,7 @@ pip install openadapt-grounding
 
 ## Overview
 
-The grounding package provides UI element detection and localization to improve:
+The grounding package provides UI element detection and grounding to improve:
 
 - Click accuracy by targeting element centers
 - Robustness to UI changes
@@ -59,7 +59,7 @@ marked_image, element_map = som.create()
 # element_map: {1: "Submit button", 2: "Email field", ...}
 ```
 
-## Integration with ML
+## Integration with Policy Execution
 
 ```python
 from openadapt_ml import AgentPolicy
@@ -71,8 +71,9 @@ policy = AgentPolicy.from_checkpoint(
     grounding=ElementDetector()
 )
 
-# Predictions will use grounded coordinates
-action = policy.predict(screenshot)
+# Actions will use grounded coordinates
+observation = load_screenshot()
+action = policy.predict(observation)
 ```
 
 ## CLI Commands
@@ -122,5 +123,5 @@ openadapt ground som screenshot.png --output marked.png
 
 ## Related Packages
 
-- [openadapt-ml](ml.md) - Use grounding in training and inference
-- [openadapt-capture](capture.md) - Ground recorded captures
+- [openadapt-ml](ml.md) - Use grounding in policy learning and execution
+- [openadapt-capture](capture.md) - Apply grounding to demonstrations
diff --git a/docs/packages/ml.md b/docs/packages/ml.md
index c2261a709..479ea3aa0 100644
--- a/docs/packages/ml.md
+++ b/docs/packages/ml.md
@@ -1,6 +1,6 @@
 # openadapt-ml
 
-ML engine, training, and inference for GUI automation agents.
+Policy learning, training, and inference for GUI automation agents.
 
 **Repository**: [OpenAdaptAI/openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml)
 
@@ -17,13 +17,13 @@ pip install openadapt-ml
 The ML package provides:
 
 - Model adapters for various LMMs (Qwen-VL, LLaVA, etc.)
-- Training infrastructure for supervised learning
+- Policy learning infrastructure from demonstration trajectories
 - Inference engine for action prediction
-- Agent policies for deployment
+- Agent policies for autonomous execution
 
 ## CLI Commands
 
-### Start Training
+### Start Policy Learning
 
 ```bash
 openadapt train start --capture my-task --model qwen3vl-2b
@@ -31,19 +31,19 @@ openadapt train start --capture my-task --model qwen3vl-2b
 
 Options:
 
-- `--capture` - Name of the capture to train on (required)
+- `--capture` - Name of the demonstration to learn from (required)
 - `--model` - Model architecture (required)
 - `--epochs` - Number of training epochs (default: 10)
 - `--batch-size` - Batch size (default: 4)
 - `--output` - Output directory (default: training_output/)
 
-### Check Training Status
+### Check Policy Learning Status
 
 ```bash
 openadapt train status
 ```
 
-### Stop Training
+### Stop Policy Learning
 
 ```bash
 openadapt train stop
@@ -72,32 +72,32 @@ from openadapt_ml import QwenVLAdapter, Trainer, AgentPolicy
 # Load a pre-trained model
 adapter = QwenVLAdapter.from_pretrained("qwen3vl-2b")
 
-# Create trainer
+# Create trainer for policy learning
 trainer = Trainer(
     model=adapter,
-    capture_name="my-task",
+    demonstration="my-task",  # demonstration name
     epochs=10
 )
 
-# Train
+# Learn policy from demonstration trajectory
 checkpoint_path = trainer.train()
 
-# Load for inference
+# Load trained policy for execution
 policy = AgentPolicy.from_checkpoint(checkpoint_path)
 
-# Predict next action
-screenshot = load_screenshot()
-action = policy.predict(screenshot)
+# Predict next action from observation
+observation = load_screenshot()
+action = policy.predict(observation)
 ```
 
-## Training Pipeline
+## Policy Learning Pipeline
 
 ```mermaid
 flowchart LR
     subgraph Input
-        CAP[Capture Data]
-        SS[Screenshots]
-        EV[Events]
+        DEMO[Demonstration]
+        OBS[Observations]
+        ACT[Actions]
     end
 
     subgraph Processing
@@ -106,20 +106,20 @@ flowchart LR
         TOK[Tokenization]
     end
 
-    subgraph Training
+    subgraph Learning
         FWD[Forward Pass]
         LOSS[Loss Calculation]
         OPT[Optimization]
     end
 
     subgraph Output
-        CKPT[Checkpoint]
+        CKPT[Trained Policy]
         LOG[Training Logs]
     end
 
-    CAP --> DL
-    SS --> DL
-    EV --> DL
+    DEMO --> DL
+    OBS --> DL
+    ACT --> DL
     DL --> AUG
     AUG --> TOK
     TOK --> FWD
@@ -135,9 +135,9 @@ flowchart LR
 |--------|-------------|
 | `QwenVLAdapter` | Qwen-VL model adapter |
 | `LLaVAAdapter` | LLaVA model adapter |
-| `Trainer` | Training infrastructure |
-| `AgentPolicy` | Inference policy |
-| `train_supervised` | Training function |
+| `Trainer` | Policy learning infrastructure |
+| `AgentPolicy` | Trained policy for execution |
+| `learn_from_demonstrations` | Policy learning function |
 
 ## Hardware Requirements
 
@@ -149,6 +149,6 @@ flowchart LR
 
 ## Related Packages
 
-- [openadapt-capture](capture.md) - Record training data
-- [openadapt-evals](evals.md) - Evaluate trained models
-- [openadapt-retrieval](retrieval.md) - Few-shot retrieval for training
+- [openadapt-capture](capture.md) - Collect demonstrations
+- [openadapt-evals](evals.md) - Evaluate trained policies
+- [openadapt-retrieval](retrieval.md) - Trajectory retrieval for few-shot policy learning
diff --git a/docs/packages/privacy.md b/docs/packages/privacy.md
index 9bbf2be9c..2a5aff056 100644
--- a/docs/packages/privacy.md
+++ b/docs/packages/privacy.md
@@ -44,7 +44,7 @@ The privacy package provides:
 
 ## CLI Commands
 
-### Scrub a Capture
+### Scrub a Demonstration
 
 ```bash
 openadapt privacy scrub my-task
@@ -78,8 +78,8 @@ from openadapt_privacy import Scrubber, PIIDetector
 # Create a scrubber
 scrubber = Scrubber(mode="blur")
 
-# Scrub a capture
-scrubber.scrub_capture("my-task", output_dir="scrubbed/")
+# Scrub a demonstration
+scrubber.scrub_demonstration("my-task", output_dir="scrubbed/")
 
 # Or scrub individual images
 scrubbed_image = scrubber.scrub_image(screenshot_path)
@@ -106,10 +106,10 @@ session = CaptureSession(
 
 recorder = Recorder(session)
 recorder.start()
-# ... recording ...
+# ... demonstration collection ...
 recorder.stop()
 
-# Captures are automatically scrubbed
+# Demonstrations are automatically scrubbed
 ```
 
 ## Redaction Modes
@@ -152,5 +152,5 @@ This package helps with compliance for:
 
 ## Related Packages
 
-- [openadapt-capture](capture.md) - Record demonstrations to scrub
-- [openadapt-viewer](viewer.md) - View scrubbed captures
+- [openadapt-capture](capture.md) - Collect demonstrations to scrub
+- [openadapt-viewer](viewer.md) - View scrubbed demonstrations
diff --git a/docs/packages/retrieval.md b/docs/packages/retrieval.md
index 1c85b4e9f..ae167ebf0 100644
--- a/docs/packages/retrieval.md
+++ b/docs/packages/retrieval.md
@@ -1,6 +1,6 @@
 # openadapt-retrieval
 
-Multimodal demonstration retrieval for few-shot prompting.
+Multimodal trajectory retrieval for few-shot policy learning.
 
 **Repository**: [OpenAdaptAI/openadapt-retrieval](https://github.com/OpenAdaptAI/openadapt-retrieval)
 
@@ -16,47 +16,47 @@ pip install openadapt-retrieval
 
 The retrieval package enables:
 
-- Semantic search over captured demonstrations
-- Few-shot example selection for prompting
+- Semantic search over demonstration trajectories
+- Few-shot example selection for policy learning
 - Multimodal similarity (text + image)
 - Demonstration library management
 
 ## Use Cases
 
-### Few-Shot Prompting
+### Few-Shot Policy Learning
 
-Find similar demonstrations to use as examples when prompting an LMM.
+Find similar demonstrations to use as examples when learning agent policies.
 
-### Transfer Learning
+### Trajectory Transfer
 
-Retrieve relevant demonstrations for new tasks.
+Retrieve relevant demonstration trajectories for new tasks.
 
 ### Demonstration Discovery
 
-Search your library of captured demonstrations.
+Search your library of demonstration trajectories.
 
 ## Python API
 
 ```python
 from openadapt_retrieval import DemoIndex, retrieve_similar
 
-# Build an index over your captures
+# Build an index over your demonstrations
 index = DemoIndex()
-index.add_captures(["task-1", "task-2", "task-3"])
+index.add_demonstrations(["task-1", "task-2", "task-3"])
 
-# Retrieve similar demonstrations
-screenshot = load_screenshot()
+# Retrieve similar demonstration trajectories
+observation = load_screenshot()
 similar = index.search(
-    query_image=screenshot,
+    query_image=observation,
     query_text="click the submit button",
     top_k=3
 )
 
 for result in similar:
-    print(f"{result.capture_name}: {result.similarity:.2f}")
+    print(f"{result.demonstration_name}: {result.similarity:.2f}")
 ```
 
-### Integration with ML
+### Integration with Policy Learning
 
 ```python
 from openadapt_ml import AgentPolicy
@@ -69,8 +69,9 @@ policy = AgentPolicy.from_checkpoint(
     retrieval_index=index
 )
 
-# Predictions include relevant examples
-action = policy.predict(screenshot, use_retrieval=True)
+# Policy uses similar trajectory examples for few-shot learning
+observation = load_screenshot()
+action = policy.predict(observation, use_retrieval=True)
 ```
 
 ## CLI Commands
@@ -87,7 +88,7 @@ openadapt retrieval index --captures task-1 task-2 task-3
 openadapt retrieval search --image screenshot.png --text "click submit"
 ```
 
-### List Indexed Captures
+### List Indexed Demonstrations
 
 ```bash
 openadapt retrieval list
@@ -97,7 +98,7 @@ openadapt retrieval list
 
 | Export | Description |
 |--------|-------------|
-| `DemoIndex` | Demonstration index |
+| `DemoIndex` | Demonstration trajectory index |
 | `retrieve_similar` | Similarity search |
 | `Embedding` | Vector embedding |
 | `SearchResult` | Search result data |
@@ -118,7 +119,7 @@ Indexes are stored as pickle files:
 indexes/
   demo_index.pkl      # Main index
   embeddings.npy      # Vector embeddings
-  metadata.json       # Capture metadata
+  metadata.json       # Demonstration metadata
 ```
 
 ## Performance
@@ -131,5 +132,5 @@ indexes/
 
 ## Related Packages
 
-- [openadapt-capture](capture.md) - Record demonstrations to index
-- [openadapt-ml](ml.md) - Use retrieval in training
+- [openadapt-capture](capture.md) - Collect demonstrations to index
+- [openadapt-ml](ml.md) - Use retrieval in policy learning
diff --git a/docs/packages/viewer.md b/docs/packages/viewer.md
index 54646ceb8..2314413d3 100644
--- a/docs/packages/viewer.md
+++ b/docs/packages/viewer.md
@@ -1,6 +1,6 @@
 # openadapt-viewer
 
-HTML visualization components for capture data.
+Trajectory visualization components for demonstration data.
 
 **Repository**: [OpenAdaptAI/openadapt-viewer](https://github.com/OpenAdaptAI/openadapt-viewer)
 
@@ -16,14 +16,14 @@ pip install openadapt-viewer
 
 The viewer package provides:
 
-- HTML-based visualization of captures
-- Interactive replay viewer
-- Event timeline display
-- Screenshot galleries
+- HTML-based visualization of demonstration trajectories
+- Interactive trajectory viewer
+- Action timeline display
+- Observation galleries
 
 ## CLI Commands
 
-### View a Capture
+### View a Demonstration Trajectory
 
 ```bash
 openadapt capture view my-task
@@ -49,8 +49,8 @@ Access the dashboard at `http://localhost:8080`.
 ```python
 from openadapt_viewer import PageBuilder, HTMLBuilder
 
-# Build a viewer page for a capture
-builder = PageBuilder(capture_name="my-task")
+# Build a viewer page for a demonstration
+builder = PageBuilder(demonstration="my-task")
 html = builder.build()
 
 # Save to file
@@ -59,29 +59,29 @@ with open("viewer.html", "w") as f:
 
 # Or use HTMLBuilder for custom visualizations
 html_builder = HTMLBuilder()
-html_builder.add_screenshot(screenshot_path, events)
-html_builder.add_timeline(events)
+html_builder.add_observation(screenshot_path, actions)
+html_builder.add_timeline(actions)
 html = html_builder.render()
 ```
 
 ## Viewer Features
 
-### Screenshot Gallery
+### Observation Gallery
 
-Browse all captured screenshots with navigation controls.
+Browse all captured observations (screenshots) with navigation controls.
 
-### Event Timeline
+### Action Timeline
 
 Interactive timeline showing:
 
-- Mouse events (clicks, movement)
-- Keyboard events (key presses)
-- Screenshot timestamps
-- Event metadata
+- Mouse actions (clicks, movement)
+- Keyboard actions (key presses)
+- Observation timestamps
+- Action metadata
 
-### Replay Controls
+### Trajectory Playback Controls
 
-- Play/pause replay
+- Play/pause trajectory playback
 - Speed controls (0.5x, 1x, 2x)
 - Step forward/backward
 - Jump to specific time
@@ -90,16 +90,16 @@ Interactive timeline showing:
 
 - Export as HTML (static)
 - Export as video (MP4)
-- Export event log (JSON)
+- Export trajectory log (JSON)
 
 ## Key Exports
 
 | Export | Description |
 |--------|-------------|
-| `PageBuilder` | Builds viewer pages |
+| `PageBuilder` | Builds trajectory viewer pages |
 | `HTMLBuilder` | Low-level HTML construction |
-| `TimelineWidget` | Timeline visualization |
-| `ScreenshotGallery` | Screenshot browser |
+| `TimelineWidget` | Action timeline visualization |
+| `ObservationGallery` | Observation browser |
 
 ## Customization
 
@@ -109,27 +109,27 @@ Interactive timeline showing:
 from openadapt_viewer import PageBuilder, Theme
 
 builder = PageBuilder(
-    capture_name="my-task",
+    demonstration="my-task",
     theme=Theme.DARK  # or Theme.LIGHT
 )
 ```
 
-### Custom Event Rendering
+### Custom Action Rendering
 
 ```python
-from openadapt_viewer import PageBuilder, EventRenderer
+from openadapt_viewer import PageBuilder, ActionRenderer
 
-class CustomRenderer(EventRenderer):
-    def render_mouse_click(self, event):
-        return f"<div class='custom-click'>{event}</div>"
+class CustomRenderer(ActionRenderer):
+    def render_click(self, action):
+        return f"<div class='custom-click'>{action}</div>"
 
 builder = PageBuilder(
-    capture_name="my-task",
+    demonstration="my-task",
     renderer=CustomRenderer()
 )
 ```
 
 ## Related Packages
 
-- [openadapt-capture](capture.md) - Record data to visualize
+- [openadapt-capture](capture.md) - Collect demonstrations to visualize
 - [openadapt-privacy](privacy.md) - Scrub sensitive data before viewing
diff --git a/docs/publication-roadmap.md b/docs/publication-roadmap.md
new file mode 100644
index 000000000..8eb076530
--- /dev/null
+++ b/docs/publication-roadmap.md
@@ -0,0 +1,527 @@
+# OpenAdapt Publication Roadmap
+
+**Version**: 1.0
+**Date**: January 2026
+**Status**: Active Planning
+**Author**: OpenAdapt Research Team
+
+---
+
+## Executive Summary
+
+This roadmap outlines the publication strategy for OpenAdapt's core research contributions. The primary innovation is **demonstration-conditioned GUI agents**, which achieve dramatic accuracy improvements (33% to 100% first-action accuracy) by conditioning VLM agents on human demonstrations rather than relying solely on natural language instructions.
+
+---
+
+## Table of Contents
+
+1. [Publishable Contributions](#1-publishable-contributions)
+2. [Publication Timeline](#2-publication-timeline)
+3. [Required Experiments](#3-required-experiments)
+4. [Author Contributions](#4-author-contributions)
+5. [Venue Analysis](#5-venue-analysis)
+6. [Existing Drafts and Assets](#6-existing-drafts-and-assets)
+
+---
+
+## 1. Publishable Contributions
+
+### 1.1 Demo-Conditioned GUI Agents (Core Innovation)
+
+**The Big Result**: Demonstration conditioning improves first-action accuracy from 33% to 100% on macOS tasks, with expected similar improvements (+30-50pp) on Windows Agent Arena (WAA).
+
+**Key Claims**:
+- Demonstrations capture implicit knowledge that natural language prompts cannot convey
+- Demo retrieval enables automatic selection of relevant examples from a library
+- The "show, don't tell" paradigm reduces prompt engineering burden
+- Works with any VLM backend (Claude, GPT, Gemini, Qwen-VL)
+
+**Research Questions Addressed**:
+1. How much does demonstration context improve GUI agent performance?
+2. Can we automatically retrieve relevant demonstrations for new tasks?
+3. What is the transfer efficiency between similar tasks across platforms?
+
+**Preliminary Results** (from `/Users/abrichr/oa/src/openadapt-ml/docs/experiments/`):
+- Zero-shot (instruction only): 33% first-action accuracy
+- Demo-conditioned: 100% first-action accuracy (+67pp improvement)
+- Demo persists across ALL steps (critical P0 fix for episode success)
+
+**WAA Predictions** (from experiment design):
+- Zero-shot expected: 10-20% task success (consistent with SOTA ~19.5%)
+- Demo-conditioned expected: 40-70% task success (+30-50pp improvement)
+
+---
+
+### 1.2 Modular Open-Source Architecture (Meta-Package Design)
+
+**Contribution**: A composable, model-agnostic architecture for GUI automation research.
+
+**Key Components**:
+| Package | Responsibility | Key Innovation |
+|---------|---------------|----------------|
+| `openadapt-capture` | GUI recording | Cross-platform event + a11y tree capture |
+| `openadapt-ml` | Training & inference | Model-agnostic VLM adapters |
+| `openadapt-evals` | Benchmark evaluation | Unified adapter for WAA, WebArena |
+| `openadapt-retrieval` | Demo search | Multimodal (text+image) embedding with Qwen3-VL |
+| `openadapt-grounding` | Element localization | Multiple providers (OmniParser, Florence2, Gemini) |
+| `openadapt-viewer` | Visualization | Interactive HTML trajectory viewer |
+| `openadapt-privacy` | PII scrubbing | Privacy-preserving demonstration storage |
+
+**Technical Highlights**:
+- Abstraction ladder: Literal -> Symbolic -> Template -> Semantic -> Goal
+- Process graph representations for temporal context
+- Three-phase architecture: DEMONSTRATE -> LEARN -> EXECUTE
+- Feedback loops for continuous improvement
+
+**Prior Art Comparison**:
+| System | Open Source | Modular | Demo-Conditioned | Multi-VLM |
+|--------|------------|---------|------------------|-----------|
+| OpenAdapt | Yes | Yes | **Yes** | Yes |
+| Claude Computer Use | No | No | No | No |
+| UFO | Partial | No | No | No |
+| SeeAct | Yes | No | No | No |
+
+---
+
+### 1.3 Benchmark Evaluation Framework (WAA Integration)
+
+**Contribution**: Unified evaluation infrastructure for GUI agent benchmarks.
+
+**Key Features**:
+- `BenchmarkAdapter` abstract interface for any benchmark
+- `WAALiveAdapter` with HTTP-based `/evaluate` endpoint
+- `ApiAgent` supporting Claude, GPT-5.1, Gemini backends
+- `RetrievalAugmentedAgent` for automatic demo selection
+- Execution trace collection with screenshots per step
+- HTML viewer for result analysis
+
+**Benchmark Coverage**:
+| Benchmark | Status | Tasks | Domain |
+|-----------|--------|-------|--------|
+| Windows Agent Arena (WAA) | Implemented | 154 tasks | Windows desktop |
+| Mock Benchmark | Implemented | N tasks | Testing |
+| WebArena | Partial | 812 tasks | Web browser |
+| OSWorld | Planned | 369 tasks | Cross-platform |
+
+**WAA Task Selection** (from experiment design):
+- 10 carefully selected tasks across 4 enterprise-relevant domains
+- Browser/Edge (3 tasks): Privacy settings, bookmarks, font size
+- Office/LibreOffice (3 tasks): Fill blanks, charts, alignment
+- Settings (2 tasks): Notifications, Night Light scheduling
+- File Explorer (2 tasks): Archive creation, view changes
+
+---
+
+### 1.4 Multimodal Retrieval for Demo Conditioning
+
+**Contribution**: Automatic demonstration retrieval using VLM embeddings.
+
+**Technical Approach**:
+- **Embedder**: Qwen3-VL-Embedding with Matryoshka Representation Learning (MRL)
+- **Index**: FAISS vector index with cosine similarity
+- **Query**: Multimodal (task text + current screenshot)
+- **Reranking**: Cross-encoder for top-k refinement
+
+**Key Classes** (from `openadapt-retrieval`):
+```python
+# Core retrieval interface
+retriever = MultimodalDemoRetriever(embedding_dim=512)
+retriever.add_demo(demo_id, task, screenshot, app_name)
+retriever.build_index()
+results = retriever.retrieve(task, screenshot, top_k=3)
+```
+
+**Performance Considerations**:
+- Qwen3-VL: ~6-8 GB VRAM, ~50-200ms per embedding
+- CLIP fallback: ~2 GB VRAM, ~10-50ms per embedding
+- Flexible dimensions via MRL: 256, 512, 1024, 2048
+
+---
+
+## 2. Publication Timeline
+
+### Phase 1: Short-Term (Q1 2026)
+
+#### 2.1.1 Blog Post / Technical Report
+
+**Target**: January-February 2026
+**Venue**: OpenAdapt blog, HuggingFace, towards data science
+**Effort**: 1-2 weeks
+
+**Content**:
+- Demo-conditioned GUI agents: The "show, don't tell" paradigm
+- Preliminary results (33% -> 100% accuracy)
+- Open-source release announcement
+- Interactive demo with viewer
+
+**Deliverables**:
+- [ ] Write blog post (~2000 words)
+- [ ] Create figures (architecture diagram, accuracy comparison)
+- [ ] Record demo video (2-3 minutes)
+- [ ] Publish to blog + cross-post to HN, Reddit, Twitter
+
+---
+
+#### 2.1.2 arXiv Preprint
+
+**Target**: February-March 2026
+**Venue**: arXiv cs.AI, cs.HC
+**Effort**: 3-4 weeks
+
+**Title Options**:
+1. "Show, Don't Tell: Demonstration-Conditioned GUI Automation with Vision-Language Models"
+2. "OpenAdapt: An Open Framework for Demo-Conditioned GUI Agents"
+3. "From Demonstrations to Actions: Retrieval-Augmented GUI Automation"
+
+**Existing Drafts**:
+- `/Users/abrichr/oa/src/omnimcp/paper/omnimcp_whitepaper.tex` - Spatial-temporal framework
+- `/Users/abrichr/oa/src/omnimcp/paper/omnimcp_arxiv.tex` - Full arXiv draft (1056 lines)
+
+**Structure** (based on existing drafts):
+1. Abstract
+2. Introduction (demo-conditioning motivation)
+3. Related Work (GUI automation, VLM agents, PbD)
+4. Method
+   - Architecture overview
+   - Demo-conditioned prompting
+   - Retrieval-augmented generation
+5. Experiments
+   - macOS demo experiment
+   - WAA benchmark evaluation
+   - Ablation studies
+6. Results
+   - First-action accuracy
+   - Episode success rate
+   - Transfer across platforms
+7. Discussion & Limitations
+8. Conclusion
+
+**Deliverables**:
+- [ ] Complete WAA experiments (10 tasks x 2 conditions)
+- [ ] Update existing LaTeX draft with new results
+- [ ] Add retrieval system section
+- [ ] Create supplementary materials (code, demos)
+- [ ] Submit to arXiv
+
+---
+
+### Phase 2: Medium-Term (Q2-Q3 2026)
+
+#### 2.2.1 Workshop Paper
+
+**Target**: April-June 2026
+**Venues** (submission deadlines vary):
+| Venue | Conference | Deadline | Focus |
+|-------|-----------|----------|-------|
+| LLM Agents Workshop | ICML 2026 | ~March | Agent architectures |
+| Human-AI Workshop | CHI 2026 | ~Dec 2025 | Human-AI collaboration |
+| AutoML Workshop | NeurIPS 2026 | ~Sept | Automation |
+
+**Format**: 4-8 pages + references
+**Effort**: 2-3 weeks (building on preprint)
+
+**Focus**: Demo retrieval and conditioning system
+**Novelty**: Multimodal retrieval for GUI automation
+
+---
+
+#### 2.2.2 Demo Paper (CHI/UIST)
+
+**Target**: CHI 2027 or UIST 2026
+**Venues**:
+| Venue | Deadline | Acceptance Rate |
+|-------|----------|-----------------|
+| CHI Demo Track | Sept 2026 | ~50% |
+| UIST Demo Track | April 2026 | ~40% |
+
+**Format**: 2-4 pages + live demo
+**Effort**: 2 weeks for paper, 1 week for demo prep
+
+**Demo Content**:
+1. Record a demonstration (any application)
+2. Show retrieval selecting similar demos
+3. Execute task with demo conditioning
+4. Visualize predictions in viewer
+
+**Deliverables**:
+- [ ] Prepare stable demo environment
+- [ ] Create video walkthrough
+- [ ] Write demo paper
+- [ ] Prepare live demo hardware/software
+
+---
+
+### Phase 3: Long-Term (Q4 2026 - 2027)
+
+#### 2.3.1 Full Conference Paper
+
+**Target**: NeurIPS 2026, ICML 2027, or ICLR 2027
+**Effort**: 3-6 months
+
+**Venues**:
+| Venue | Deadline | Page Limit | Focus |
+|-------|----------|------------|-------|
+| NeurIPS | May 2026 | 9+refs | ML methods |
+| ICML | Feb 2027 | 8+refs | ML methods |
+| ICLR | Oct 2026 | 8+refs | Representations |
+| AAAI | Aug 2026 | 7+refs | AI systems |
+| ACL | Feb 2027 | 8+refs | NLP/multimodal |
+
+**Contribution Options**:
+
+**Option A: Demo-Conditioning Method Paper** (NeurIPS/ICML)
+- Focus: Retrieval-augmented demo conditioning
+- Experiments: WAA, WebArena, OSWorld comparison
+- Ablations: Retrieval methods, embedding models, k values
+- Baselines: Zero-shot, few-shot, fine-tuned
+
+**Option B: Systems Paper** (MLSys)
+- Focus: Modular architecture for GUI automation
+- Experiments: Latency, throughput, grounding accuracy
+- Comparisons: End-to-end vs modular approaches
+
+**Option C: HCI Paper** (CHI Full)
+- Focus: Human-AI collaboration in task automation
+- User study: Demo creation time, task success, trust
+- Qualitative: User preferences, failure modes
+
+---
+
+## 3. Required Experiments
+
+### 3.1 Completed Experiments
+
+| Experiment | Status | Location | Result |
+|------------|--------|----------|--------|
+| macOS demo-conditioning | Done | `openadapt-ml/docs/experiments/` | 33% -> 100% |
+| Demo prompt format | Done | Same | Behavior-only format best |
+| API baselines | Done | `openadapt-evals` | Claude, GPT working |
+
+---
+
+### 3.2 Required for arXiv (P0)
+
+| Experiment | Description | Effort | Status |
+|------------|-------------|--------|--------|
+| WAA zero-shot baseline | 10 tasks, no demos | 2-3 hours | Pending |
+| WAA demo-conditioned | 10 tasks, with demos | 2-3 hours | Pending |
+| Demo creation | Write demos for 10 WAA tasks | 4-6 hours | Design complete |
+| Statistical analysis | Significance tests, confidence intervals | 1-2 hours | Pending |
+
+**WAA Task List** (from experiment design):
+1. Edge: Do Not Track
+2. Edge: Bookmark to bar
+3. Edge: Font size
+4. LibreOffice Calc: Fill blanks
+5. LibreOffice Calc: Chart creation
+6. LibreOffice Writer: Center align
+7. Settings: Notifications off
+8. Settings: Night Light schedule
+9. File Explorer: Archive folder
+10. File Explorer: Details view
+
+---
+
+### 3.3 Required for Workshop/Demo Paper (P1)
+
+| Experiment | Description | Effort | Status |
+|------------|-------------|--------|--------|
+| Retrieval accuracy | Measure if correct demo retrieved | 1 day | Pending |
+| Retrieval latency | Embedding + search time | 2 hours | Pending |
+| Cross-domain transfer | Demo from app A helps app B | 1 week | Pending |
+| Demo library size | Performance vs library size | 2-3 days | Pending |
+
+---
+
+### 3.4 Required for Full Conference Paper (P2)
+
+| Experiment | Description | Effort | Status |
+|------------|-------------|--------|--------|
+| WebArena evaluation | 100+ web tasks | 1-2 weeks | Pending |
+| OSWorld evaluation | Cross-platform tasks | 2-3 weeks | Pending |
+| Fine-tuning comparison | Demo prompting vs fine-tuning | 2-4 weeks | Pending |
+| Ablation: VLM backend | Claude vs GPT vs Gemini | 1 week | Partial |
+| Ablation: Embedding model | Qwen3-VL vs CLIP vs ColPali | 1 week | Pending |
+| Ablation: Demo format | Full trace vs behavior-only | 3 days | Partial |
+| User study | N=20-30 participants | 2-4 weeks | Pending |
+
+---
+
+## 4. Author Contributions
+
+### 4.1 Proposed Author Order
+
+**Lead Authors** (equal contribution):
+1. **Richard Abrich** - Architecture, demo-conditioning, experiments
+2. **[Contributor 2]** - Retrieval system, embeddings
+
+**Contributing Authors**:
+3. **[Contributor 3]** - WAA benchmark integration
+4. **[Contributor 4]** - Grounding module
+5. **[Contributor 5]** - Viewer and visualization
+
+**Acknowledgments**:
+- OmniParser team (Microsoft)
+- Windows Agent Arena team (Microsoft)
+- Open-source contributors
+
+---
+
+### 4.2 Contribution Matrix
+
+| Contribution | Lead | Contributors |
+|--------------|------|--------------|
+| Architecture design | RA | - |
+| Demo-conditioning method | RA | - |
+| Retrieval system | - | - |
+| WAA integration | RA | - |
+| Grounding providers | RA | - |
+| Experiments: macOS | RA | - |
+| Experiments: WAA | RA | - |
+| Writing: Introduction | RA | - |
+| Writing: Method | RA | - |
+| Writing: Experiments | RA | - |
+| Figures and diagrams | RA | - |
+| Code open-sourcing | RA | - |
+
+---
+
+## 5. Venue Analysis
+
+### 5.1 Target Venues by Contribution Type
+
+#### Systems/Architecture
+| Venue | Deadline | Fit | Notes |
+|-------|----------|-----|-------|
+| MLSys | Jan 2026 | Good | Modular architecture focus |
+| OSDI | May 2026 | Medium | More systems-focused |
+| SoCC | June 2026 | Medium | Cloud systems angle |
+
+#### ML Methods
+| Venue | Deadline | Fit | Notes |
+|-------|----------|-----|-------|
+| NeurIPS | May 2026 | Excellent | Demo-conditioning as retrieval |
+| ICML | Feb 2027 | Excellent | Method + experiments |
+| ICLR | Oct 2026 | Good | Representation learning angle |
+
+#### HCI/Agents
+| Venue | Deadline | Fit | Notes |
+|-------|----------|-----|-------|
+| CHI | Sept 2026 | Excellent | Human-AI, user study |
+| UIST | April 2026 | Excellent | Demo interaction |
+| IUI | Oct 2026 | Good | Intelligent interfaces |
+
+#### NLP/Multimodal
+| Venue | Deadline | Fit | Notes |
+|-------|----------|-----|-------|
+| ACL | Feb 2027 | Good | Multimodal grounding |
+| EMNLP | May 2026 | Good | VLM applications |
+| NAACL | Dec 2026 | Good | Shorter, regional |
+
+---
+
+### 5.2 Workshop Opportunities
+
+| Workshop | Conference | Typical Deadline | Focus |
+|----------|-----------|------------------|-------|
+| LLM Agents | ICML/NeurIPS | 2-3 months before | Agent architectures |
+| Human-AI Interaction | CHI/IUI | Variable | Collaboration |
+| AutoML | NeurIPS | September | Automation |
+| Efficient ML | ICML/NeurIPS | Variable | Efficiency |
+
+---
+
+## 6. Existing Drafts and Assets
+
+### 6.1 Paper Drafts
+
+| File | Location | Status | Content |
+|------|----------|--------|---------|
+| `omnimcp_whitepaper.tex` | `/Users/abrichr/oa/src/omnimcp/paper/` | Complete (whitepaper) | Spatial-temporal framework, 530 lines |
+| `omnimcp_arxiv.tex` | `/Users/abrichr/oa/src/omnimcp/paper/` | Complete (arXiv format) | Full paper, 1056 lines, benchmarks pending |
+| `omnimcp_whitepaper.pdf` | Same | Compiled | 2.7 MB |
+| `omnimcp_arxiv.pdf` | Same | Compiled | 133 KB |
+
+### 6.2 Figures
+
+| Figure | Location | Description |
+|--------|----------|-------------|
+| `spatial-features.png` | `/Users/abrichr/oa/src/omnimcp/paper/` | Spatial feature understanding |
+| `temporal-features.png` | Same | Temporal feature understanding |
+| `api-generation.png` | Same | Internal API generation |
+| `api-publication.png` | Same | External API (MCP) publication |
+
+### 6.3 Documentation
+
+| Document | Location | Relevance |
+|----------|----------|-----------|
+| `architecture-evolution.md` | `/Users/abrichr/oa/src/OpenAdapt/docs/` | Full architecture description |
+| `waa_demo_experiment_design.md` | `/Users/abrichr/oa/src/openadapt-ml/docs/experiments/` | WAA experiment details |
+| `waa-evaluator-integration.md` | `/Users/abrichr/oa/src/openadapt-evals/docs/research/` | Evaluation methodology |
+| `CLAUDE.md` files | Various repos | Implementation details |
+
+### 6.4 Code Assets
+
+| Asset | Location | Description |
+|-------|----------|-------------|
+| openadapt-capture | GitHub | Recording package |
+| openadapt-ml | GitHub | Training/inference |
+| openadapt-evals | GitHub | Benchmarks |
+| openadapt-retrieval | GitHub | Demo retrieval |
+| openadapt-grounding | GitHub | UI localization |
+| openadapt-viewer | GitHub | Visualization |
+
+---
+
+## 7. Action Items
+
+### Immediate (This Week)
+
+- [ ] Complete 10 WAA demo documents
+- [ ] Run WAA zero-shot baseline
+- [ ] Run WAA demo-conditioned evaluation
+- [ ] Update omnimcp_arxiv.tex with new results
+
+### Short-Term (Next 2 Weeks)
+
+- [ ] Write blog post announcing demo-conditioning results
+- [ ] Create comparison figure (zero-shot vs demo-conditioned)
+- [ ] Record demo video
+- [ ] Finalize arXiv submission
+
+### Medium-Term (Next Month)
+
+- [ ] Implement retrieval accuracy metrics
+- [ ] Run cross-domain transfer experiments
+- [ ] Identify workshop submission targets
+- [ ] Begin CHI/UIST demo preparation
+
+---
+
+## 8. Risk Assessment
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|------------|--------|------------|
+| WAA results don't match predictions | Medium | High | Focus on subset where demos help most |
+| Retrieval accuracy insufficient | Low | Medium | Add reranking, increase demo library |
+| Competition publishes first | Medium | Medium | Differentiate with open-source, modularity |
+| Reviewer skepticism of accuracy claims | Medium | Medium | Multiple seeds, statistical tests |
+
+---
+
+## 9. References
+
+### Key Citations for Paper
+
+1. **Windows Agent Arena** - Bonatti et al., 2024. Microsoft benchmark, SOTA 19.5%.
+2. **OmniParser** - Chen et al., 2024. Vision-only UI parsing.
+3. **Set-of-Mark** - Yang et al., 2023. Visual grounding via labels.
+4. **Claude Computer Use** - Anthropic, 2024. Production VLM agent.
+5. **UFO** - Microsoft, 2024. Windows agent architecture.
+6. **Qwen-VL** - Alibaba, 2024. Open-source VLM.
+7. **WebArena** - Zhou et al., 2023. Web automation benchmark.
+8. **OSWorld** - Xie et al., 2024. Cross-platform benchmark.
+
+---
+
+*Last updated: January 2026*
diff --git a/docs/roadmap-priorities.md b/docs/roadmap-priorities.md
new file mode 100644
index 000000000..26f86492c
--- /dev/null
+++ b/docs/roadmap-priorities.md
@@ -0,0 +1,562 @@
+# OpenAdapt Roadmap - Priorities
+
+**Last Updated**: January 16, 2026
+**Version**: 1.1.0
+**Status**: Active Development
+
+---
+
+## Executive Summary
+
+This document outlines the prioritized roadmap for OpenAdapt, focusing on ensuring the modular meta-package architecture is stable, functional, and delivers on the core promise: **Record -> Train -> Evaluate** GUI automation workflows.
+
+---
+
+## Current State Assessment
+
+### PyPI Packages Published
+
+| Package | Version | Python | Status |
+|---------|---------|--------|--------|
+| `openadapt` | 1.0.0 (meta) | >=3.10 | Published |
+| `openadapt-capture` | 0.1.0 | >=3.10 | Published |
+| `openadapt-ml` | 0.2.0 | >=3.12 | Published |
+| `openadapt-evals` | 0.1.0 | >=3.10 | Published |
+| `openadapt-viewer` | 0.1.0 | >=3.10 | Published |
+| `openadapt-grounding` | 0.1.0 | >=3.10 | Published |
+| `openadapt-retrieval` | 0.1.0 | >=3.10 | Published |
+| `openadapt-privacy` | 0.1.0 | >=3.10 | Published |
+
+**Note**: `openadapt-ml` requires Python 3.12+, which may cause compatibility issues with other packages requiring 3.10.
+
+### CI/Test Status
+
+- **Main repo**: CI runs on macOS and Ubuntu, Python 3.10/3.11/3.12
+- **Lint check**: `ruff check` and `ruff format --check` - **Currently Passing**
+- **Tests verified**:
+  - `openadapt-grounding`: 53 tests passing
+  - `openadapt-retrieval`: 28 tests passing
+- **Known issues**: PR #969 addresses ruff format, Docker build needs verification
+
+### Meta-Package Structure
+
+The `openadapt` meta-package v1.0.0 uses:
+- Hatchling build system
+- Lazy imports to avoid heavy dependencies
+- Optional extras: `[capture]`, `[ml]`, `[evals]`, `[viewer]`, `[grounding]`, `[retrieval]`, `[privacy]`, `[core]`, `[all]`
+
+---
+
+## Priority Definitions
+
+| Priority | Urgency | Timeframe | Description |
+|----------|---------|-----------|-------------|
+| **P0** | Critical | This week | Blockers preventing basic functionality |
+| **P1** | High | 1-2 weeks | Core feature completion, essential for v1.0 |
+| **P2** | Medium | This month | Important enhancements, user experience |
+| **P3** | Lower | Backlog | Nice to have, future considerations |
+
+---
+
+## P0 - Critical: Blocking Issues
+
+### 1. Fix CI - Ruff Format (PR #969)
+
+| Field | Value |
+|-------|-------|
+| **Status** | In Progress |
+| **Effort** | Small (1-2 hours) |
+| **Owner** | TBD |
+| **PR** | #969 |
+| **Branch** | `fix/ruff-format-config` |
+
+**Description**: The CI workflow runs `ruff format --check openadapt/` which may fail if code is not formatted. A fix branch exists with formatting applied.
+
+**Current State**: Local `ruff check` passes. Branch `fix/ruff-format-config` contains formatting fixes.
+
+**Next Actions**:
+- [ ] Review and merge PR #969
+- [ ] Verify CI passes on all Python versions (3.10, 3.11, 3.12)
+- [ ] Verify CI passes on all platforms (macOS, Ubuntu)
+
+**Files**:
+- `.github/workflows/main.yml`
+- `openadapt/config.py`
+- `openadapt/cli.py`
+
+---
+
+### 2. Fix Docker Build
+
+| Field | Value |
+|-------|-------|
+| **Status** | Needs Investigation |
+| **Effort** | Medium (2-4 hours) |
+| **Owner** | TBD |
+| **Location** | `legacy/deploy/deploy/models/omniparser/Dockerfile` |
+
+**Description**: Docker build for OmniParser server may have issues. This is used for the grounding provider integration.
+
+**Next Actions**:
+- [ ] Test `docker build` for OmniParser Dockerfile
+- [ ] Verify CUDA/GPU support works correctly
+- [ ] Test model download during build (huggingface-cli)
+- [ ] Document any missing dependencies or configuration
+
+**Files**:
+- `legacy/deploy/deploy/models/omniparser/Dockerfile`
+
+---
+
+### 3. Verify Meta-Package Installs Correctly
+
+| Field | Value |
+|-------|-------|
+| **Status** | Needs Testing |
+| **Effort** | Medium (2-4 hours) |
+| **Owner** | TBD |
+
+**Description**: Critical compatibility issue - `openadapt-ml` requires Python 3.12+, but `openadapt-capture` and others require 3.10+. Need to verify `pip install openadapt[all]` works.
+
+**Test Matrix**:
+
+| Installation | Python 3.10 | Python 3.11 | Python 3.12 |
+|-------------|-------------|-------------|-------------|
+| `openadapt` | Test | Test | Test |
+| `openadapt[capture]` | Test | Test | Test |
+| `openadapt[ml]` | Expected Fail | Expected Fail | Test |
+| `openadapt[core]` | Expected Fail | Expected Fail | Test |
+| `openadapt[all]` | Expected Fail | Expected Fail | Test |
+
+**Next Actions**:
+- [ ] Test `pip install openadapt[all]` on Python 3.12
+- [ ] Test `pip install openadapt[core]` on Python 3.12
+- [ ] Verify imports work: `python -c "from openadapt.cli import main"`
+- [ ] Document minimum Python version clearly (3.12 if ml is needed)
+- [ ] Consider downgrading `openadapt-ml` requirements to 3.10+ if feasible
+
+---
+
+### 4. Basic Capture -> Train -> Eval Workflow
+
+| Field | Value |
+|-------|-------|
+| **Status** | Needs End-to-End Testing |
+| **Effort** | Large (4-8 hours) |
+| **Owner** | TBD |
+
+**Description**: The core value proposition requires this workflow to function:
+
+```bash
+openadapt capture start --name my-task   # 1. Record demo
+openadapt train start --capture my-task   # 2. Train model
+openadapt eval run --checkpoint model.pt  # 3. Evaluate
+```
+
+**CLI Commands to Test**:
+
+| Command | Status | Notes |
+|---------|--------|-------|
+| `openadapt capture start` | Needs Test | Requires macOS permissions |
+| `openadapt capture list` | Needs Test | |
+| `openadapt capture view <name>` | Needs Test | Generates HTML |
+| `openadapt capture stop` | TODO | Uses Ctrl+C currently |
+| `openadapt train start` | Needs Test | Requires openadapt-ml |
+| `openadapt eval run --agent api-claude` | Needs Test | Requires API key |
+| `openadapt eval mock --tasks 10` | Needs Test | Quick verification |
+
+**Next Actions**:
+- [ ] Test `openadapt capture start` on macOS (permissions required)
+- [ ] Test `openadapt capture list` shows recordings
+- [ ] Test `openadapt capture view <name>` generates HTML
+- [ ] Test `openadapt train start` with real capture data
+- [ ] Test `openadapt eval run --agent api-claude` with API key
+- [ ] Test `openadapt eval mock --tasks 10` for quick verification
+- [ ] Document any failures and create issues
+
+**Known Blockers**:
+- `capture stop` is TODO (uses Ctrl+C currently)
+- macOS requires Accessibility + Screen Recording permissions
+
+---
+
+## P1 - High: Core Features
+
+### 5. Complete Baseline Adapters
+
+| Field | Value |
+|-------|-------|
+| **Status** | Partially Implemented |
+| **Effort** | Medium (4-8 hours) |
+| **Owner** | TBD |
+| **Package** | `openadapt-ml` |
+
+**Description**: API baseline adapters (Anthropic, OpenAI, Google) are implemented but need testing and validation.
+
+**Adapter Status**:
+
+| Provider | Adapter | Status | Notes |
+|----------|---------|--------|-------|
+| Anthropic | Claude | Implemented | Claude Computer Use patterns |
+| OpenAI | GPT-4V | Implemented | Needs testing |
+| Google | Gemini | Implemented | Needs testing |
+| Qwen | Qwen3-VL | Implemented | Local model |
+
+**Next Actions**:
+- [ ] Test Anthropic adapter with Claude API
+- [ ] Test OpenAI adapter with GPT-4V
+- [ ] Test Google adapter with Gemini
+- [ ] Verify prompts follow SOTA patterns (Claude CU, UFO, OSWorld)
+- [ ] Add error handling for rate limits and API failures
+- [ ] Document adapter usage and configuration
+
+---
+
+### 6. Demo Conditioning Integration in Evals
+
+| Field | Value |
+|-------|-------|
+| **Status** | Designed, Needs Integration |
+| **Effort** | Medium (4-8 hours) |
+| **Owner** | TBD |
+| **Packages** | `openadapt-retrieval`, `openadapt-evals` |
+
+**Description**: Demo-conditioned prompting shows **33% -> 100% first-action accuracy improvement**. This is a key differentiator.
+
+**Architecture**:
+```
+openadapt-retrieval (demo library) -> openadapt-ml (adapters) -> openadapt-evals (benchmark)
+```
+
+**Next Actions**:
+- [ ] Integrate `openadapt-retrieval` with `openadapt-ml` adapters
+- [ ] Add `--demo` flag to `openadapt eval run`
+- [ ] Test with real demo library on WAA benchmark
+- [ ] Document demo library format (JSON structure, screenshots)
+- [ ] Add `--demo-library` option for multi-demo retrieval
+
+---
+
+### 7. WAA Benchmark Validation
+
+| Field | Value |
+|-------|-------|
+| **Status** | Blocked on Azure VM Setup |
+| **Effort** | Medium (4-8 hours) |
+| **Owner** | TBD |
+| **Package** | `openadapt-evals` |
+
+**Description**: Need to validate demo-conditioning claims on full Windows Agent Arena benchmark. This provides credibility for landing page claims.
+
+**Infrastructure Required**:
+- Azure VM with nested virtualization (Windows 10/11)
+- WAA server running
+- API keys for Claude/GPT-4V
+
+**Target Metrics**:
+
+| Metric | Baseline (No Demo) | With Demo | Target |
+|--------|-------------------|-----------|--------|
+| First-action accuracy | ~33% | ~100% | Validate |
+| Episode success rate | TBD | TBD | Measure |
+| Average steps | TBD | TBD | Measure |
+
+**Next Actions**:
+- [ ] Start Azure VM with WAA server (nested virtualization)
+- [ ] Run `openadapt eval run --agent api-claude --server <waa-url>`
+- [ ] Record metrics: episode success rate, avg steps, failure modes
+- [ ] Generate HTML report with `openadapt-viewer`
+- [ ] Document results for landing page claims
+
+---
+
+## P2 - Medium: Enhancements
+
+### 8. Safety Gate Implementation
+
+| Field | Value |
+|-------|-------|
+| **Status** | Design Phase |
+| **Effort** | Medium (4-8 hours) |
+| **Owner** | TBD |
+| **Package** | `openadapt-ml` |
+
+**Description**: Implement safety gates to prevent harmful or unintended actions during agent execution.
+
+**Safety Categories**:
+1. **Pre-action validation**: Check action against allowed patterns
+2. **Dangerous action detection**: Block destructive file ops, system commands
+3. **Human-in-the-loop confirmation**: Require approval for certain actions
+4. **Rollback capability**: Undo recent actions if needed
+
+**Next Actions**:
+- [ ] Design safety gate API interface
+- [ ] Implement pre-action validation hooks
+- [ ] Add dangerous action detection (rm, format, delete, etc.)
+- [ ] Add optional human confirmation prompts
+- [ ] Document safety configuration options
+
+---
+
+### 9. Grounding Provider Improvements
+
+| Field | Value |
+|-------|-------|
+| **Status** | Package Published (53 tests passing) |
+| **Effort** | Medium (4-6 hours) |
+| **Owner** | TBD |
+| **Package** | `openadapt-grounding` |
+
+**Description**: `openadapt-grounding` provides UI element localization for improved click accuracy. Needs integration with ML package.
+
+**Available Providers**:
+
+| Provider | Backend | Status | GPU Required |
+|----------|---------|--------|--------------|
+| OmniGrounder | OmniParser | Working | Yes (CUDA) |
+| GeminiGrounder | Gemini API | Working | No |
+| SoMGrounder | Set-of-Marks | Working | Yes |
+
+**Next Actions**:
+- [ ] Integrate with `openadapt-ml` action replay
+- [ ] Test OmniGrounder with recorded captures
+- [ ] Test GeminiGrounder with API key
+- [ ] Add grounding visualization to `openadapt-viewer`
+- [ ] Document grounding provider selection
+- [ ] Fix Docker build for OmniParser server
+
+---
+
+### 10. Viewer Dashboard Features
+
+| Field | Value |
+|-------|-------|
+| **Status** | Basic HTML Generation Works |
+| **Effort** | Medium (4-8 hours) |
+| **Owner** | TBD |
+| **Package** | `openadapt-viewer` |
+
+**Description**: `openadapt-viewer` generates HTML but could be enhanced for better debugging and analysis.
+
+**Requested Features**:
+
+| Feature | Priority | Complexity |
+|---------|----------|------------|
+| Video playback from screenshots | High | Medium |
+| Action timeline with seek | High | Medium |
+| Side-by-side comparison view | Medium | Low |
+| Filtering by action type | Medium | Low |
+| Benchmark result integration | Medium | Medium |
+| Failure analysis tools | Medium | High |
+
+**Next Actions**:
+- [ ] Add video playback (from captured screenshots)
+- [ ] Add action timeline with seek
+- [ ] Add side-by-side comparison view
+- [ ] Add filtering by action type
+- [ ] Integrate with benchmark results for failure analysis
+
+---
+
+## P3 - Lower: Nice to Have
+
+### 11. Telemetry (GlitchTip)
+
+| Field | Value |
+|-------|-------|
+| **Status** | Design Doc Complete |
+| **Effort** | Large (1-2 weeks) |
+| **Owner** | TBD |
+| **Design Doc** | `docs/design/telemetry-design.md` |
+
+**Description**: Create `openadapt-telemetry` package for unified error tracking and usage analytics across all packages.
+
+**Key Features**:
+- GlitchTip/Sentry SDK integration
+- Privacy filtering (path sanitization, PII scrubbing)
+- Internal user tagging (CI detection, dev mode)
+- Opt-out mechanisms (DO_NOT_TRACK env var)
+
+**Next Actions**:
+- [ ] Create `openadapt-telemetry` package scaffold
+- [ ] Implement Sentry/GlitchTip integration
+- [ ] Add privacy filtering (path sanitization, PII scrubbing)
+- [ ] Add internal user tagging (CI detection, dev mode)
+- [ ] Create opt-out mechanisms (DO_NOT_TRACK env var)
+- [ ] Integrate with openadapt-evals as pilot
+
+---
+
+### 12. Additional Benchmarks (WebArena, OSWorld)
+
+| Field | Value |
+|-------|-------|
+| **Status** | Future Consideration |
+| **Effort** | Large (2-4 weeks) |
+| **Owner** | TBD |
+| **Package** | `openadapt-evals` |
+
+**Description**: Expand evaluation infrastructure beyond WAA.
+
+**Target Benchmarks**:
+
+| Benchmark | Type | Status | Priority |
+|-----------|------|--------|----------|
+| Windows Agent Arena (WAA) | Desktop | In Progress | High |
+| WebArena | Web Browser | Not Started | Medium |
+| OSWorld | Cross-Platform | Not Started | Medium |
+| MiniWoB++ | Synthetic | Not Started | Low |
+
+**Next Actions**:
+- [ ] Implement WebArena adapter for browser automation
+- [ ] Implement OSWorld adapter for cross-platform desktop
+- [ ] Create unified metrics across benchmarks
+- [ ] Add benchmark comparison view
+
+---
+
+### 13. Documentation Site (docs.openadapt.ai)
+
+| Field | Value |
+|-------|-------|
+| **Status** | MkDocs Configured, Needs Deployment |
+| **Effort** | Medium (4-6 hours) |
+| **Owner** | TBD |
+| **Config** | `mkdocs.yml` |
+
+**Description**: Documentation site using MkDocs with existing markdown files.
+
+**Existing Documentation**:
+- `docs/index.md` - Home page
+- `docs/architecture.md` - System architecture
+- `docs/cli.md` - CLI reference
+- `docs/packages/*.md` - Package documentation
+- `docs/getting-started/*.md` - Installation, quickstart, permissions
+
+**Next Actions**:
+- [ ] Verify `mkdocs.yml` configuration
+- [ ] Run `mkdocs build` and test locally
+- [ ] Set up GitHub Actions for auto-deploy to GitHub Pages
+- [ ] Configure CNAME for docs.openadapt.ai
+- [ ] Add API reference (auto-generated from docstrings)
+- [ ] Write getting-started tutorial (5-minute quickstart)
+
+---
+
+## Dependency Graph
+
+```
+P0: Fix CI (PR #969) ─────────────────────────────────────────────────┐
+P0: Docker Build ─────────────────────────────────────────────────────┤
+P0: Verify Meta-Package ──────────────────────────────────────────────┤
+P0: Basic Workflow ───────────────────────────────────────────────────┤
+                                                                      │
+                                                                      v
+P1: Baseline Adapters ────────────────────────────────────────────────┤
+P1: Demo Conditioning ────────────────────────────────────────────────┤
+P1: WAA Benchmark ────────────────────────────────────────────────────┘
+        │
+        v
+P2: Safety Gates ─────────────────────────────────────────────────────┐
+P2: Grounding Improvements ───────────────────────────────────────────┤
+P2: Viewer Dashboard ─────────────────────────────────────────────────┘
+        │
+        v
+P3: Telemetry (GlitchTip) ────────────────────────────────────────────┐
+P3: Additional Benchmarks ────────────────────────────────────────────┤
+P3: Documentation Site ───────────────────────────────────────────────┘
+```
+
+---
+
+## Technical Debt
+
+### Known Issues
+
+| Issue | Severity | Package | Notes |
+|-------|----------|---------|-------|
+| Python version mismatch | Medium | `openadapt-ml` | Requires 3.12+, others 3.10+ |
+| `capture stop` TODO | Low | `openadapt` CLI | Uses Ctrl+C instead of signal/file |
+| `release-and-publish.yml` uses hatchling | Low | Main repo | Aligned with meta-package |
+| Legacy code | Low | `/legacy/` | Many TODOs, not blocking v1.0 |
+
+### Code Quality
+
+| Package | TODOs | Notes |
+|---------|-------|-------|
+| `openadapt/cli.py` | 1 | Implement stop via signal/file |
+| `legacy/` | 100+ | Historical, not blocking v1.0 |
+
+---
+
+## Success Criteria
+
+### P0 Complete (This Week)
+
+- [ ] CI passes on all matrix combinations (Python 3.10/3.11/3.12, macOS/Ubuntu)
+- [ ] PR #969 merged
+- [ ] Docker build succeeds for OmniParser
+- [ ] `pip install openadapt[core]` works on Python 3.12
+- [ ] Basic capture/eval workflow demonstrated
+
+### P1 Complete (1-2 Weeks)
+
+- [ ] API agents (Claude, GPT-4V) working with demo conditioning
+- [ ] WAA baseline established with metrics
+- [ ] First-action accuracy validated (33% -> 100% with demo)
+
+### P2 Complete (This Month)
+
+- [ ] Safety gates implemented and documented
+- [ ] Grounding improving action accuracy
+- [ ] Viewer dashboard with video playback
+
+### P3 Complete (Backlog)
+
+- [ ] Telemetry package published
+- [ ] docs.openadapt.ai live
+- [ ] Additional benchmarks integrated
+
+---
+
+## Resources Required
+
+| Resource | Purpose | Status |
+|----------|---------|--------|
+| Azure credits | WAA benchmark VM | Needed |
+| Anthropic API key | Claude testing | Available |
+| OpenAI API key | GPT-4V testing | Needed |
+| Google API key | Gemini testing | Needed |
+| Test machines | Windows 10/11, Ubuntu 22.04/24.04 | Needed |
+| DNS access | docs.openadapt.ai CNAME | Needed |
+
+---
+
+## Appendix: Quick Reference
+
+### PyPI Package URLs
+
+- https://pypi.org/project/openadapt/
+- https://pypi.org/project/openadapt-capture/
+- https://pypi.org/project/openadapt-ml/
+- https://pypi.org/project/openadapt-evals/
+- https://pypi.org/project/openadapt-viewer/
+- https://pypi.org/project/openadapt-grounding/
+- https://pypi.org/project/openadapt-retrieval/
+- https://pypi.org/project/openadapt-privacy/
+
+### GitHub Repositories
+
+- Main: https://github.com/OpenAdaptAI/openadapt
+- Sub-packages: https://github.com/OpenAdaptAI/openadapt-{capture,ml,evals,viewer,grounding,retrieval,privacy}
+
+### Related Documents
+
+- Architecture: `/docs/architecture.md`
+- Telemetry Design: `/docs/design/telemetry-design.md`
+- Landing Page Strategy: `/docs/design/landing-page-strategy.md`
+- Legacy Freeze: `/docs/legacy/freeze.md`
+
+---
+
+*This roadmap is a living document. Update as priorities shift based on user feedback and technical discoveries.*
diff --git a/mkdocs.yml b/mkdocs.yml
index 39e4c9985..c0b353060 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -116,8 +116,23 @@ nav:
       - openadapt-grounding: packages/grounding.md
       - openadapt-retrieval: packages/retrieval.md
       - openadapt-privacy: packages/privacy.md
-  - Architecture: architecture.md
+  - Architecture:
+      - Overview: architecture.md
+      - Evolution: architecture-evolution.md
+  - Design:
+      - Index: design/INDEX.md
+      - System Tray App: design/openadapt-tray.md
+      - Tray Logging: design/tray-logging.md
+      - Telemetry: design/telemetry-design.md
+      - Landing Page: design/landing-page-strategy.md
+      - Repo Rename Analysis: design/repo-rename-analysis.md
+  - Roadmap:
+      - Priorities: roadmap-priorities.md
+      - Publications: publication-roadmap.md
   - CLI Reference: cli.md
   - Contributing: contributing.md
   - Legacy:
       - Legacy Freeze: legacy/freeze.md
+      - Legacy Freeze (Alt): LEGACY_FREEZE.md
+  - Reference:
+      - macOS Permissions: permissions-macos.md
diff --git a/pyproject.toml b/pyproject.toml
index b737bbb04..e27f1f34a 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -6,7 +6,7 @@ readme = "README.md"
 requires-python = ">=3.10"
 license = "MIT"
 authors = [
-    {name = "MLDSAI Inc.", email = "richard@mldsai.com"}
+    {name = "Richard Abrich", email = "richard@openadapt.ai"}
 ]
 keywords = ["gui", "automation", "ml", "rpa", "agent", "vlm", "computer-use"]
 classifiers = [