Skip to content

Conversation

@basnijholt
Copy link
Owner

Summary

Adds a new --long-conversation mode to memory-proxy that maintains a single, continuous conversation with intelligent compression, optimized for 100-200k token context windows.

Key insight: User input is precious and hard to summarize without loss. LLM output is verbose and derivable. Compress asymmetrically.

Features

  • Chronological context: Maintains full conversation history as markdown segments
  • Asymmetric compression: User messages compressed gently (70%), assistant messages aggressively (20%)
  • Code block deduplication: Detects near-duplicate code blocks and stores compact diffs
  • Token budget enforcement: Compresses when approaching context limit

Usage

agent-cli memory-proxy \
    --long-conversation \
    --context-budget 150000 \
    --compress-threshold 0.8 \
    --raw-recent-tokens 40000

Implementation

  • Phase 1: Basic storage, segment persistence, context building
  • Phase 2: Asymmetric compression with LLM summarization
  • Phase 3: Code block extraction, similarity detection, diff-based deduplication

Known limitations

  • Streaming falls back to non-streaming (with warning)
  • Token estimation uses len(text) // 4 heuristic

Test plan

  • Unit tests for code block extraction and similarity detection
  • Integration tests for segment persistence
  • Integration tests for compression triggers
  • Integration tests for deduplication
  • Manual testing with real LLM

Implement chronological context with token budget enforcement for
single long-running conversations. This mode maintains conversation
history as segments and builds context by including recent turns
up to the token budget.

New features:
- --long-conversation flag for memory proxy command
- --context-budget, --compress-threshold, --raw-recent-tokens options
- Segment and LongConversation data models
- File-based persistence (markdown with YAML frontmatter)
- Basic context building with token budget enforcement

Phase 2+ (not yet implemented):
- Asymmetric compression (user vs assistant)
- Code block deduplication
- Streaming support
…Phase 2)

Implements intelligent compression that prioritizes assistant messages:
- User messages: gentle 70% compression, preserve code blocks and quotes
- Assistant messages: aggressive 20% compression to bullet points

Adds integration tests covering the full transformation pipeline.
… (Phase 3)

Add repetition detection that identifies near-duplicate code blocks and
stores compact references with diffs instead of full content.

- Extract fenced code blocks using regex
- Detect similarity using difflib.SequenceMatcher (>85% threshold)
- Store reference + unified diff when savings > 70%
- Integrate deduplication into segment creation flow
- Add 6 integration tests for repetition detection

This saves tokens when users paste the same or similar code multiple
times during a conversation.
basnijholt and others added 3 commits December 1, 2025 02:11
…plication

Add two integration tests to verify Phase 2 and Phase 3 features work together:

- test_compression_and_deduplication_together: Verifies compression triggers
  mid-conversation and deduplication still works for repeated content
- test_build_context_with_all_segment_states: Verifies build_context correctly
  handles raw, summarized, and reference segments in the same conversation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants