Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ jobs:
run: uv sync --extra dev

- name: Run tests
run: uv run pytest tests/ -v
run: uv run pytest tests/ -v --ignore=tests/test_browser_bridge.py --timeout=120
timeout-minutes: 10

lint:
runs-on: ubuntu-latest
Expand Down
72 changes: 51 additions & 21 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,11 @@ uv add openadapt-capture
# Install with audio support (large download)
uv add "openadapt-capture[audio]"

# Run tests
uv run pytest tests/ -v
# Run tests (exclude browser bridge tests which need websockets fixtures)
uv run pytest tests/ -v --ignore=tests/test_browser_bridge.py

# Run slow integration tests (requires accessibility permissions)
uv run pytest tests/ -v -m slow

# Record a GUI capture
uv run python -c "
Expand All @@ -44,41 +47,68 @@ for action in capture.actions():

```
openadapt_capture/
recorder.py # Recorder context manager for GUI event capture
capture.py # Capture class for loading and iterating events/actions
platform/ # Platform-specific implementations (Windows, macOS, Linux)
storage/ # Data persistence (SQLite + media files)
media/ # Audio/video capture and synchronization
visualization/ # Demo GIF and HTML viewer generation
recorder.py # Multi-process recorder (legacy OpenAdapt record.py architecture)
capture.py # CaptureSession class for loading and iterating events/actions
events.py # Pydantic event models (MouseMoveEvent, KeyDownEvent, etc.)
processing.py # Event merging pipeline (clicks, drags, typing)
db/ # SQLAlchemy database layer
__init__.py # Engine, session factory, Base
models.py # Recording, ActionEvent, Screenshot, WindowEvent, PerformanceStat, MemoryStat
crud.py # Insert functions, batch writing, post-processing
window/ # Platform-specific active window capture
extensions/ # SynchronizedQueue (multiprocessing.Queue wrapper)
utils.py # Timestamps, screenshots, monitor dims
config.py # Recording config (RECORD_VIDEO, RECORD_AUDIO, etc.)
video.py # Video encoding (av/ffmpeg)
audio.py # Audio recording + transcription
visualize/ # Demo GIF and HTML viewer generation
share.py # Magic Wormhole sharing
browser_bridge.py # Browser extension integration
cli.py # CLI commands (capture record, capture info, capture share)
```

## Key Components

### Recorder
Main interface for capturing GUI interactions:
- `__enter__` / `__exit__` - Context manager lifecycle
- `record_events()` - Main capture loop
- `event_count` - Total captured events
Multi-process recording system (copied from legacy OpenAdapt):
- `Recorder(capture_dir, task_description)` - Context manager
- Internally runs `record()` which spawns reader threads + writer processes
- Action-gated video capture (only encode frames when user acts)
- Stop via context manager exit or stop sequences (default: `llqq`)

### Capture
### CaptureSession / Capture
Load and query recorded captures:
- `Capture.load(path)` - Load from directory
- `capture.events()` - Iterator over raw events
- `capture.actions()` - Iterator over processed actions
- `Capture.load(path)` - Load from capture directory (reads `recording.db`)
- `capture.raw_events()` - List of Pydantic events from SQLAlchemy DB
- `capture.actions()` - Iterator over processed actions (clicks, drags, typing)
- `action.screenshot` - PIL Image at time of action (extracted from video)
- `action.x`, `action.y`, `action.dx`, `action.dy`, `action.button`, `action.text`

### Storage
SQLAlchemy-based per-capture databases:
- Each capture gets its own `recording.db` in the capture directory
- Models: Recording, ActionEvent, Screenshot, WindowEvent, PerformanceStat, MemoryStat
- Writer processes get their own sessions via `get_session_for_path(db_path)`

### Event Types
- Raw: `mouse.move`, `mouse.down`, `mouse.up`, `key.down`, `key.up`, `screen.frame`, `audio.chunk`
- Processed: `click`, `double_click`, `drag`, `scroll`, `type`
- Raw: `mouse.move`, `mouse.down`, `mouse.up`, `mouse.scroll`, `key.down`, `key.up`
- Processed: `mouse.singleclick`, `mouse.doubleclick`, `mouse.drag`, `mouse.scroll`, `key.type`

## Testing

```bash
uv run pytest tests/ -v
# Fast tests (unit + integration, no recording)
uv run pytest tests/ -v --ignore=tests/test_browser_bridge.py -m "not slow"

# Slow tests (full recording pipeline with pynput synthetic input)
uv run pytest tests/ -v -m slow

# All tests
uv run pytest tests/ -v --ignore=tests/test_browser_bridge.py
```

## Related Projects

- [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml) - Train models on captures
- [openadapt-privacy](https://github.com/OpenAdaptAI/openadapt-privacy) - PII scrubbing
- [openadapt-viewer](https://github.com/OpenAdaptAI/openadapt-viewer) - Visualization
- [openadapt-retrieval](https://github.com/OpenAdaptAI/openadapt-retrieval) - Demo retrieval
- [openadapt-evals](https://github.com/OpenAdaptAI/openadapt-evals) - Benchmark evaluation
130 changes: 58 additions & 72 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

Capture platform-agnostic GUI interaction streams with time-aligned screenshots and audio for training ML models or replaying workflows.

> **Status:** Pre-alpha. See [docs/DESIGN.md](docs/DESIGN.md) for architecture discussion.
> **Status:** Pre-alpha.

---

Expand Down Expand Up @@ -70,8 +70,6 @@ from openadapt_capture import Recorder
with Recorder("./my_capture", task_description="Demo task") as recorder:
# Captures mouse, keyboard, and screen until context exits
input("Press Enter to stop recording...")

print(f"Captured {recorder.event_count} events")
```

### Replay / Analysis
Expand All @@ -91,84 +89,84 @@ for action in capture.actions():
### Low-Level API

```python
from openadapt_capture import (
create_capture, process_events,
MouseDownEvent, MouseButton,
)

# Create storage (platform and screen size auto-detected)
capture, storage = create_capture("./my_capture")

# Write raw events
storage.write_event(MouseDownEvent(timestamp=1.0, x=100, y=200, button=MouseButton.LEFT))

# Query and process
raw_events = storage.get_events()
actions = process_events(raw_events) # Merges clicks, drags, typed text
from openadapt_capture.db import create_db, get_session_for_path
from openadapt_capture.db import crud
from openadapt_capture.db.models import Recording, ActionEvent

# Create a database
engine, Session = create_db("/path/to/recording.db")
session = Session()

# Insert a recording
recording = crud.insert_recording(session, {
"timestamp": 1700000000.0,
"monitor_width": 1920,
"monitor_height": 1080,
"platform": "win32",
"task_description": "My task",
})

# Insert events
crud.insert_action_event(session, recording, 1700000001.0, {
"name": "click",
"mouse_x": 100.0,
"mouse_y": 200.0,
"mouse_button_name": "left",
"mouse_pressed": True,
})

# Query events back
from openadapt_capture.capture import CaptureSession
capture = CaptureSession.load("/path/to/capture_dir")
actions = list(capture.actions())
```

## Event Types

**Raw events** (captured):
- `mouse.move`, `mouse.down`, `mouse.up`, `mouse.scroll`
- `key.down`, `key.up`
- `screen.frame`, `audio.chunk`

**Actions** (processed):
- `mouse.singleclick`, `mouse.doubleclick`, `mouse.drag`
- `key.type` (merged keystrokes text)
- `key.type` (merged keystrokes into text)

## Architecture

The recorder uses a multi-process architecture copied from legacy OpenAdapt:

- **Reader threads**: Capture mouse, keyboard, screen, and window events into a central queue
- **Processor thread**: Routes events to type-specific write queues
- **Writer processes**: Persist events to SQLAlchemy DB (one process per event type)
- **Action-gated video**: Only encodes video frames when user actions occur

```
capture_directory/
├── capture.db # SQLite: events, metadata
├── video.mp4 # Screen recording
└── audio.flac # Audio (optional)
├── recording.db # SQLite: events, screenshots, window events, perf stats
├── oa_recording-{ts}.mp4 # Screen recording (action-gated)
└── audio.flac # Audio (optional)
```

## Performance Statistics
## Performance Testing

Track event write latency and analyze capture performance:
Run a performance test with synthetic input:

```python
from openadapt_capture import Recorder

with Recorder("./my_capture") as recorder:
input("Press Enter to stop...")

# Access performance statistics
summary = recorder.stats.summary()
print(f"Mean latency: {summary['mean_latency_ms']:.1f}ms")

# Generate performance plot
recorder.stats.plot(output_path="performance.png")
```bash
uv run python scripts/perf_test.py
```

![Performance Statistics](docs/images/performance_stats.png)

## Frame Extraction Verification
This records for 10 seconds using pynput Controllers, then reports:
- Wall/CPU time and memory usage
- Event counts and action types
- Output file sizes
- Memory usage plot (saved to capture directory)

Compare extracted video frames against original images to verify lossless capture:
Run integration tests (requires accessibility permissions):

```python
from openadapt_capture import compare_video_to_images, plot_comparison

# Compare frames
report = compare_video_to_images(
"capture/video.mp4",
[(timestamp, image) for timestamp, image in captured_frames],
)

print(f"Mean diff: {report.mean_diff_overall:.2f}")
print(f"Lossless: {report.is_lossless}")

# Visualize comparison
plot_comparison(report, output_path="comparison.png")
```bash
uv run pytest tests/test_performance.py -v -m slow
```

![Frame Comparison](docs/images/frame_comparison.png)

## Visualization

Generate animated demos and interactive viewers from recordings:
Expand All @@ -191,21 +189,6 @@ capture = Capture.load("./my_capture")
create_html(capture, output="viewer.html", include_audio=True)
```

The HTML viewer includes:
- Timeline scrubber with event markers
- Frame-by-frame navigation
- Synchronized audio playback
- Event list with details panel
- Keyboard shortcuts (Space, arrows, Home/End)

![Capture Viewer](docs/images/viewer_screenshot.png)

### Generate Demo from Command Line

```bash
uv run python scripts/generate_readme_demo.py --duration 10
```

## Sharing Recordings

Share recordings between machines using [Magic Wormhole](https://magic-wormhole.readthedocs.io/):
Expand Down Expand Up @@ -236,7 +219,10 @@ The `share` command compresses the recording, sends it via Magic Wormhole, and e

```bash
uv sync --dev
uv run pytest
uv run pytest tests/ -v --ignore=tests/test_browser_bridge.py

# Run slow integration tests (requires accessibility permissions)
uv run pytest tests/ -v -m slow
```

## Related Projects
Expand Down
40 changes: 24 additions & 16 deletions openadapt_capture/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,18 @@
compare_video_to_images,
plot_comparison,
)
from openadapt_capture.db.models import (
ActionEvent as DBActionEvent,
)

# Database models (low-level)
from openadapt_capture.db.models import (
Recording,
Screenshot,
)
from openadapt_capture.db.models import (
WindowEvent as DBWindowEvent,
)

# Event types
from openadapt_capture.events import (
Expand Down Expand Up @@ -54,23 +66,20 @@
remove_invalid_keyboard_events,
remove_redundant_mouse_move_events,
)
from openadapt_capture.recorder import Recorder

# Recorder requires pynput which needs a display server (X11/Wayland/macOS/Windows).
# Make it optional so the package is importable in headless environments (CI, servers).
try:
from openadapt_capture.recorder import Recorder
except ImportError:
Recorder = None # type: ignore[assignment,misc]

# Performance statistics
from openadapt_capture.stats import (
CaptureStats,
PerfStat,
plot_capture_performance,
)
from openadapt_capture.storage import Capture as CaptureMetadata

# Storage (low-level)
from openadapt_capture.storage import (
CaptureStorage,
Stream,
create_capture,
load_capture,
)

# Visualization
from openadapt_capture.visualize import create_demo, create_html
Expand Down Expand Up @@ -134,12 +143,11 @@
# Screen/audio events
"ScreenFrameEvent",
"AudioChunkEvent",
# Storage (low-level)
"CaptureMetadata",
"Stream",
"CaptureStorage",
"create_capture",
"load_capture",
# Database models (low-level)
"Recording",
"DBActionEvent",
"Screenshot",
"DBWindowEvent",
# Processing
"process_events",
"remove_invalid_keyboard_events",
Expand Down
Loading