Adaptive Summarization System + Memory Pipeline Integration #110

basnijholt · 2025-11-27T00:48:19Z

No description provided.

- Add @agent.output_validator to validate LLM decisions - Catch invalid UPDATE/DELETE/NONE with non-existent IDs - Send helpful error messages via ModelRetry for retry - Graceful fallback to add all facts when retries exhausted - Add AI journal POC example for testing MemoryClient - Improve reconciliation prompt with clearer examples

Switch from tool calls to JSON mode output for the reconciliation agent. This works better with local models (like reasoning models) that put output in reasoning_content field instead of content. PromptedOutput injects the schema into the prompt and enables JSON mode (response_format={"type": "json_object"}), matching mem0's approach.

- Add list_all() method to MemoryClient to retrieve all stored memories - Add 'show' command to display all stored facts about the user - Add 'profile' command to generate a structured profile summary using LLM - Enhance 'chat' command to use profile context for personalized responses The POC now demonstrates a "self-model" system that: 1. Extracts facts from user input 2. Stores and retrieves them semantically 3. Generates profile summaries on demand 4. Uses the profile to personalize conversations This validates the core hypothesis: MemoryClient can serve as the foundation for a personal knowledge system that knows who you are.

Analyzes architecture, features, and test results comparing our MemoryClient-based POC (~200 LOC) with the full aijournal project (~15,000+ LOC). Key findings: - POC successfully extracts facts and generates accurate profiles - Main gap is learning over time (strength tracking, decay, feedback) - Recommends adding simple strength field to close 80% of functionality gap with 20% of aijournal's complexity Includes concrete test results from ingesting 12+ blog posts.

basnijholt · 2025-11-27T02:53:39Z

agent_cli/memory/_ingest.py

                    )
                    replacement_map[orig] = new_id
+                else:
+                    # UPDATE with unknown ID = treat as ADD (model used wrong event)


can this even happen still?

Implement research-grounded summarization inspired by Letta and Mem0: - AdaptiveSummarizer with 5 levels (NONE, BRIEF, STANDARD, DETAILED, HIERARCHICAL) - Hierarchical summary storage (L1 chunks, L2 groups, L3 final) in ChromaDB - File-based persistence with YAML front matter in markdown files - Token counting via tiktoken with fallback to cl100k_base - Level-specific compression ratios (20%, 12%, 7%, capped 2000 tokens) Structure: - agent_cli/summarizer/ - standalone reusable summarization module - summaries/L1/chunk_*.md, L2/group_*.md, L3/final.md file hierarchy - Soft-delete old summaries to deleted/ folder before replacing

- Fix datetime.utcnow() deprecation, use datetime.now(UTC) - Extract duplicate chunk summarization to _summarize_single_chunk() - Add SummarizationError exception for better error handling - Add retry with exponential backoff (1s, 2s, 4s) for generation failures - Add middle-truncation fallback for oversized content (Letta-style) - Export SummarizationError from module __init__

- Remove AdaptiveSummarizer class in favor of standalone functions - Add SummarizerConfig dataclass for configuration - Export determine_level() as pure function (no state needed) - Update summarize(), update_rolling_summary() to take config parameter - Update _ingest.py to use new functional API - Update all tests for new API This matches the functional style used throughout the codebase, reducing state and improving testability.

…ic API - Rename prompts.py → _prompts.py and utils.py → _utils.py - Reduce public API to 6 essential exports: SummarizerConfig, summarize, SummaryResult, SummaryLevel, HierarchicalSummary, SummarizationError - Remove determine_level, update_rolling_summary, count_tokens from public API - Update imports in adaptive.py and test files

Replace the old rolling summary system with the new hierarchical adaptive summarizer. This simplifies the codebase by removing redundant code paths and using a single, research-backed approach. Changes: - Update extract_and_store_facts_and_summaries() to use summarize_content() and store_adaptive_summary() instead of update_summary()/persist_summary() - Remove old summary functions: update_summary, persist_summary, get_summary_entry - Remove Summary entity and SummaryOutput model (unused) - Add summary_level to L3 metadata for consistency - Update tests to mock new summarizer interface The new system automatically selects summarization level (NONE, BRIEF, STANDARD, DETAILED, HIERARCHICAL) based on content complexity, storing summaries in a L1/L2/L3 hierarchical structure.

…maries - Create docs/architecture/summarizer.md with comprehensive technical specification for the adaptive summarization system - Update memory.md to reflect new L1/L2/L3 hierarchical summary structure - Document level thresholds, compression ratios, and research basis - Add content-type aware prompts documentation - Document integration with memory system and storage format

Removed unused code: - update_rolling_summary() - never called anywhere - _raw_generate() fallback - errors should fail loudly - retry/backoff logic - same reason - parent_group from ChunkSummary - stored but never read - ROLLING_SUMMARY_PROMPT - only used by removed function Kept middle_truncate() - useful for handling very large inputs (e.g., conversations with pasted codebases). Bugfix: - Add {prior_context} to CONVERSATION, JOURNAL, DOCUMENT prompts - Previously prior_summary was silently ignored for non-"general" types - Python's .format() ignores extra kwargs, hiding the bug Updates documentation to reflect fail-fast error handling.

Expose the full power of the summarizer through a CLI command that: - Follows existing CLI patterns using shared opts module - Supports all LLM providers (ollama, openai, gemini) - Offers content-type prompts (general, conversation, journal, document) - Provides output formats (text, json, full hierarchical) - Includes chunking options and rolling summary support - Reads from file or stdin

…args - Remove unused parent_group from MemoryMetadata (was never assigned) - Refactor write_memory_file to accept optional MemoryMetadata object instead of 17 individual parameters - Simplify upsert_hierarchical_summary to use MemoryMetadata(**dict) - Rename summary_level to summary_level_name for consistency - Make tiktoken optional in token counting with fallback heuristic

Improve CLI startup time from ~0.51s to ~0.16s (69% faster) by deferring heavy imports until they're actually needed: - pydantic_ai: lazy in memory/_ingest.py, summarizer/adaptive.py, rag/engine.py - sounddevice: lazy in core/audio.py (moved to TYPE_CHECKING + function imports) - numpy: lazy in rag/_retriever.py and services/tts.py Update tests to patch modules directly (e.g., pydantic_ai.Agent) instead of through module attributes that no longer exist at import time. Add scripts/profile_imports.py for measuring import performance.

- Extract upsert_summary_entries() to avoid double to_storage_metadata() call - Extract _summarize_chunks() helper for async chunk processing pipeline

…ummary - Replace verbose Args/Returns docstrings with single-line summaries - Remove upsert_hierarchical_summary (was only used in tests) - Update tests to use upsert_summary_entries directly Net: -102 lines

Some models leak control tokens like <|constrain|>, <|end|>, etc. into their output. Add regex cleanup in _generate_summary(). Also rewrites docs/architecture/summarizer.md to focus on research foundations and design rationale rather than code snippets.

After verifying claims against actual Letta and Mem0 codebases: Letta (verified): - Partial eviction (30%) - `partial_evict_summarizer_percentage` - Middle truncation - `middle_truncate_text()` function - Fire-and-forget - `fire_and_forget()` method - arXiv:2310.08560 Mem0 (corrected): - Two-phase architecture (verified) - fact extraction then memory ops - Removed "90%+ compression" claim - refers to token savings vs full context, not summarization compression ratios - Removed "rolling summaries" attribution - not a Mem0 term - arXiv:2504.19413 Also removes incorrect "based on Mem0 research" from code docstrings where compression ratios were empirically chosen, not research-derived.

Previously, the summarizer was summarizing the already-compressed extracted facts, which is redundant. Now it summarizes the actual user/assistant messages, which is what makes sense for a conversation summary.

- Document what's actually borrowed from research: - Two-phase architecture from Mem0 (arXiv:2504.19413) - Hierarchical merging concept from BOOOOKSCORE (arXiv:2310.00785) - Clarify what Letta does differently (message count, not tokens) - Acknowledge original/heuristic design choices: - Token thresholds (100/500/3000/15000) are not research-backed - L1/L2/L3 hierarchy structure is original - Chunk size (3000) is larger than BOOOOKSCORE's 2048 - Add future improvements section based on research findings

Remove old hierarchical summarization (STANDARD, DETAILED, HIERARCHICAL) in favor of a simpler 3-level system inspired by LangChain's map-reduce: - NONE: Skip summarization for very short content (<100 tokens) - BRIEF: Single-pass summary for short content (100-500 tokens) - MAP_REDUCE: LangChain-style map-reduce for longer content (500+ tokens) Key changes: - Add map_reduce.py with dynamic collapse algorithm - Remove HierarchicalSummary and ChunkSummary classes - Rename summary_level_name to summary_level in metadata - Add collapse_depth field to track map-reduce iterations - Use research-backed defaults (chunk_size=2048, token_max=3000) - Update all tests for simplified API - No backward compatibility - clean break from old implementation

Address review feedback: 1. DRY: Move SummaryOutput, SummarizationError, SummarizerConfig, and generate_summary to _utils.py - eliminates duplicate code between adaptive.py and map_reduce.py 2. Config consolidation: Remove MapReduceConfig, use SummarizerConfig throughout. map_reduce.py now accepts SummarizerConfig directly. 3. Document redundant check: The token_max check in map_reduce_summarize is kept as a safety guard for direct calls, with clear documentation explaining it's normally handled by adaptive.py.

- Remove _summarize_text function with hardcoded prompt (use centralized prompts in _prompts.py via adaptive.py instead) - Remove redundant token_max safety guard from map_reduce_summarize - Update docstring to clarify function is designed for content exceeding token_max, directing users to adaptive.summarize() for proper routing

MapReduceSummarizationError already inherits from SummarizationError, so catching and re-raising serves no purpose.

- Remove empty content check in map_reduce_summarize (caller validates) - Remove 'if summary else 0' guards (generate_summary never returns None) - Remove 'if input_tokens > 0' guards (input is guaranteed non-empty) - Remove 'if summaries else ""' guard (summaries always has content)

…ck test Compares old L1-L4 hierarchical vs new adaptive map-reduce approach: - Shows which level each system would use - Runs new summarizer and measures fact preservation - Uses specific 'needle' facts embedded in test content

…tation - Remove references to old L1-L4/STANDARD/DETAILED/HIERARCHICAL levels - Remove HierarchicalSummary and ChunkSummary (no longer exist) - Update storage format to show single summary entry - Add new section on limitations and trade-offs - Simplify error handling section - Add data models section with current code

Remove outdated references to 5-level hierarchy (STANDARD, DETAILED, HIERARCHICAL) and L1/L2/L3 storage structure. Update to reflect current 3-level system (NONE, BRIEF, MAP_REDUCE) with single final summary. Also fix prompt names to match actual implementation: - BRIEF_SUMMARY_PROMPT, STANDARD_SUMMARY_PROMPT - CHUNK_SUMMARY_PROMPT, META_SUMMARY_PROMPT - Remove non-existent ROLLING_PROMPT

…RY_PROMPT The prompt name "STANDARD" was a leftover from the old 5-level system which had a STANDARD SummaryLevel. Since that level no longer exists (now just NONE, BRIEF, MAP_REDUCE), rename to GENERAL_SUMMARY_PROMPT to match its actual purpose as the "general" content type prompt.

- Remove unused `middle_truncate()` function and its tests - Remove unused `MapReduceSummarizationError` exception class - Move `SummarizerConfig` and `SummarizationError` from _utils.py to models.py This groups all exported types in models.py and keeps _utils.py focused on actual utility functions (token counting, chunking, LLM calls). Net: -96 lines

…cases The output_validator already ensures MemoryUpdate and MemoryDelete IDs are valid. Defensive handling of unknown IDs obscures the contract and could hide real bugs. Now uses direct indexing which will raise KeyError if the validator ever fails.

…rameters Remove SummaryLevel enum and three-level strategy in favor of a simple "fits target? return as-is : map-reduce" approach. This reduces complexity while maintaining full functionality. Changes: - Remove SummaryLevel enum (NONE/BRIEF/MAP_REDUCE) - Add target_tokens parameter for absolute token limit - Add target_ratio parameter for relative compression (e.g., 0.2 = 20%) - Simplify estimate_summary_tokens to use ~10% compression ratio - Update memory integration to use compression_ratio in logging - Rewrite examples and tests for new API - Update architecture documentation Net reduction: ~165 lines of code

- Remove unused BRIEF_SUMMARY_PROMPT (brief level was removed) - Remove unused timeout field from SummarizerConfig - Update tests and examples accordingly

basnijholt added 3 commits November 26, 2025 16:37

update docs/architecture/memory.md

1bb5e80

Turn off ChromaDB telemetry

d79831a

basnijholt changed the title ~~WIP~~ Several memory improvements Nov 27, 2025

basnijholt added 3 commits November 26, 2025 16:54

basnijholt commented Nov 27, 2025

View reviewed changes

basnijholt added 22 commits November 26, 2025 19:30

Add example script

c35bc13

refactor: reduce duplication in memory store and summarizer

18d02bd

- Extract upsert_summary_entries() to avoid double to_storage_metadata() call - Extract _summarize_chunks() helper for async chunk processing pipeline

refactor: simplify docstrings and remove unused upsert_hierarchical_s…

f18b366

…ummary - Replace verbose Args/Returns docstrings with single-line summaries - Remove upsert_hierarchical_summary (was only used in tests) - Update tests to use upsert_summary_entries directly Net: -102 lines

fix(memory): summarize raw conversation turns, not extracted facts

734b43f

Previously, the summarizer was summarizing the already-compressed extracted facts, which is redundant. Now it summarizes the actual user/assistant messages, which is what makes sense for a conversation summary.

refactor(summarizer): remove redundant exception re-wrapping

83390a3

MapReduceSummarizationError already inherits from SummarizationError, so catching and re-raising serves no purpose.

basnijholt added 12 commits November 27, 2025 09:59

docs: clarify prompt comments to avoid confusion with level names

ee4fea6

Chunk memories

5a26f01

Merge origin/main into poc/aijournal

c387cfa

Merge remote-tracking branch 'origin/main' into poc/aijournal

effcd61

Merge remote-tracking branch 'origin/main' into poc/aijournal

c701cdf

Merge remote-tracking branch 'origin/main' into poc/aijournal

4a69cd4

basnijholt changed the title ~~Several memory improvements~~ Map reduce sumarization Dec 4, 2025

basnijholt changed the title ~~Map reduce sumarization~~ Recursive summarization. Dec 4, 2025

basnijholt changed the title ~~Recursive summarization.~~ Recursive summarization Dec 4, 2025

basnijholt changed the title ~~Recursive summarization~~ Adaptive Summarization System + Memory Pipeline Integration Dec 4, 2025

basnijholt added 2 commits December 3, 2025 20:10

chore(summarizer): remove dead code

5c632b8

- Remove unused BRIEF_SUMMARY_PROMPT (brief level was removed) - Remove unused timeout field from SummarizerConfig - Update tests and examples accordingly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adaptive Summarization System + Memory Pipeline Integration #110

Adaptive Summarization System + Memory Pipeline Integration #110

Uh oh!

basnijholt commented Nov 27, 2025

Uh oh!

basnijholt Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adaptive Summarization System + Memory Pipeline Integration #110

Are you sure you want to change the base?

Adaptive Summarization System + Memory Pipeline Integration #110

Uh oh!

Conversation

basnijholt commented Nov 27, 2025

Uh oh!

basnijholt Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants