-
Notifications
You must be signed in to change notification settings - Fork 9
Adaptive Summarization System + Memory Pipeline Integration #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
basnijholt
wants to merge
42
commits into
main
Choose a base branch
from
poc/aijournal
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add @agent.output_validator to validate LLM decisions - Catch invalid UPDATE/DELETE/NONE with non-existent IDs - Send helpful error messages via ModelRetry for retry - Graceful fallback to add all facts when retries exhausted - Add AI journal POC example for testing MemoryClient - Improve reconciliation prompt with clearer examples
Switch from tool calls to JSON mode output for the reconciliation agent.
This works better with local models (like reasoning models) that put
output in reasoning_content field instead of content.
PromptedOutput injects the schema into the prompt and enables JSON mode
(response_format={"type": "json_object"}), matching mem0's approach.
- Add list_all() method to MemoryClient to retrieve all stored memories - Add 'show' command to display all stored facts about the user - Add 'profile' command to generate a structured profile summary using LLM - Enhance 'chat' command to use profile context for personalized responses The POC now demonstrates a "self-model" system that: 1. Extracts facts from user input 2. Stores and retrieves them semantically 3. Generates profile summaries on demand 4. Uses the profile to personalize conversations This validates the core hypothesis: MemoryClient can serve as the foundation for a personal knowledge system that knows who you are.
Analyzes architecture, features, and test results comparing our MemoryClient-based POC (~200 LOC) with the full aijournal project (~15,000+ LOC). Key findings: - POC successfully extracts facts and generates accurate profiles - Main gap is learning over time (strength tracking, decay, feedback) - Recommends adding simple strength field to close 80% of functionality gap with 20% of aijournal's complexity Includes concrete test results from ingesting 12+ blog posts.
basnijholt
commented
Nov 27, 2025
agent_cli/memory/_ingest.py
Outdated
| ) | ||
| replacement_map[orig] = new_id | ||
| else: | ||
| # UPDATE with unknown ID = treat as ADD (model used wrong event) |
Owner
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this even happen still?
Implement research-grounded summarization inspired by Letta and Mem0: - AdaptiveSummarizer with 5 levels (NONE, BRIEF, STANDARD, DETAILED, HIERARCHICAL) - Hierarchical summary storage (L1 chunks, L2 groups, L3 final) in ChromaDB - File-based persistence with YAML front matter in markdown files - Token counting via tiktoken with fallback to cl100k_base - Level-specific compression ratios (20%, 12%, 7%, capped 2000 tokens) Structure: - agent_cli/summarizer/ - standalone reusable summarization module - summaries/L1/chunk_*.md, L2/group_*.md, L3/final.md file hierarchy - Soft-delete old summaries to deleted/ folder before replacing
- Fix datetime.utcnow() deprecation, use datetime.now(UTC) - Extract duplicate chunk summarization to _summarize_single_chunk() - Add SummarizationError exception for better error handling - Add retry with exponential backoff (1s, 2s, 4s) for generation failures - Add middle-truncation fallback for oversized content (Letta-style) - Export SummarizationError from module __init__
- Remove AdaptiveSummarizer class in favor of standalone functions - Add SummarizerConfig dataclass for configuration - Export determine_level() as pure function (no state needed) - Update summarize(), update_rolling_summary() to take config parameter - Update _ingest.py to use new functional API - Update all tests for new API This matches the functional style used throughout the codebase, reducing state and improving testability.
…ic API - Rename prompts.py → _prompts.py and utils.py → _utils.py - Reduce public API to 6 essential exports: SummarizerConfig, summarize, SummaryResult, SummaryLevel, HierarchicalSummary, SummarizationError - Remove determine_level, update_rolling_summary, count_tokens from public API - Update imports in adaptive.py and test files
Replace the old rolling summary system with the new hierarchical adaptive summarizer. This simplifies the codebase by removing redundant code paths and using a single, research-backed approach. Changes: - Update extract_and_store_facts_and_summaries() to use summarize_content() and store_adaptive_summary() instead of update_summary()/persist_summary() - Remove old summary functions: update_summary, persist_summary, get_summary_entry - Remove Summary entity and SummaryOutput model (unused) - Add summary_level to L3 metadata for consistency - Update tests to mock new summarizer interface The new system automatically selects summarization level (NONE, BRIEF, STANDARD, DETAILED, HIERARCHICAL) based on content complexity, storing summaries in a L1/L2/L3 hierarchical structure.
…maries - Create docs/architecture/summarizer.md with comprehensive technical specification for the adaptive summarization system - Update memory.md to reflect new L1/L2/L3 hierarchical summary structure - Document level thresholds, compression ratios, and research basis - Add content-type aware prompts documentation - Document integration with memory system and storage format
Removed unused code:
- update_rolling_summary() - never called anywhere
- _raw_generate() fallback - errors should fail loudly
- retry/backoff logic - same reason
- parent_group from ChunkSummary - stored but never read
- ROLLING_SUMMARY_PROMPT - only used by removed function
Kept middle_truncate() - useful for handling very large inputs
(e.g., conversations with pasted codebases).
Bugfix:
- Add {prior_context} to CONVERSATION, JOURNAL, DOCUMENT prompts
- Previously prior_summary was silently ignored for non-"general" types
- Python's .format() ignores extra kwargs, hiding the bug
Updates documentation to reflect fail-fast error handling.
Expose the full power of the summarizer through a CLI command that: - Follows existing CLI patterns using shared opts module - Supports all LLM providers (ollama, openai, gemini) - Offers content-type prompts (general, conversation, journal, document) - Provides output formats (text, json, full hierarchical) - Includes chunking options and rolling summary support - Reads from file or stdin
…args - Remove unused parent_group from MemoryMetadata (was never assigned) - Refactor write_memory_file to accept optional MemoryMetadata object instead of 17 individual parameters - Simplify upsert_hierarchical_summary to use MemoryMetadata(**dict) - Rename summary_level to summary_level_name for consistency - Make tiktoken optional in token counting with fallback heuristic
Improve CLI startup time from ~0.51s to ~0.16s (69% faster) by deferring heavy imports until they're actually needed: - pydantic_ai: lazy in memory/_ingest.py, summarizer/adaptive.py, rag/engine.py - sounddevice: lazy in core/audio.py (moved to TYPE_CHECKING + function imports) - numpy: lazy in rag/_retriever.py and services/tts.py Update tests to patch modules directly (e.g., pydantic_ai.Agent) instead of through module attributes that no longer exist at import time. Add scripts/profile_imports.py for measuring import performance.
- Extract upsert_summary_entries() to avoid double to_storage_metadata() call - Extract _summarize_chunks() helper for async chunk processing pipeline
…ummary - Replace verbose Args/Returns docstrings with single-line summaries - Remove upsert_hierarchical_summary (was only used in tests) - Update tests to use upsert_summary_entries directly Net: -102 lines
Some models leak control tokens like <|constrain|>, <|end|>, etc. into their output. Add regex cleanup in _generate_summary(). Also rewrites docs/architecture/summarizer.md to focus on research foundations and design rationale rather than code snippets.
After verifying claims against actual Letta and Mem0 codebases: Letta (verified): - Partial eviction (30%) - `partial_evict_summarizer_percentage` - Middle truncation - `middle_truncate_text()` function - Fire-and-forget - `fire_and_forget()` method - arXiv:2310.08560 Mem0 (corrected): - Two-phase architecture (verified) - fact extraction then memory ops - Removed "90%+ compression" claim - refers to token savings vs full context, not summarization compression ratios - Removed "rolling summaries" attribution - not a Mem0 term - arXiv:2504.19413 Also removes incorrect "based on Mem0 research" from code docstrings where compression ratios were empirically chosen, not research-derived.
Previously, the summarizer was summarizing the already-compressed extracted facts, which is redundant. Now it summarizes the actual user/assistant messages, which is what makes sense for a conversation summary.
- Document what's actually borrowed from research: - Two-phase architecture from Mem0 (arXiv:2504.19413) - Hierarchical merging concept from BOOOOKSCORE (arXiv:2310.00785) - Clarify what Letta does differently (message count, not tokens) - Acknowledge original/heuristic design choices: - Token thresholds (100/500/3000/15000) are not research-backed - L1/L2/L3 hierarchy structure is original - Chunk size (3000) is larger than BOOOOKSCORE's 2048 - Add future improvements section based on research findings
Remove old hierarchical summarization (STANDARD, DETAILED, HIERARCHICAL) in favor of a simpler 3-level system inspired by LangChain's map-reduce: - NONE: Skip summarization for very short content (<100 tokens) - BRIEF: Single-pass summary for short content (100-500 tokens) - MAP_REDUCE: LangChain-style map-reduce for longer content (500+ tokens) Key changes: - Add map_reduce.py with dynamic collapse algorithm - Remove HierarchicalSummary and ChunkSummary classes - Rename summary_level_name to summary_level in metadata - Add collapse_depth field to track map-reduce iterations - Use research-backed defaults (chunk_size=2048, token_max=3000) - Update all tests for simplified API - No backward compatibility - clean break from old implementation
Address review feedback: 1. DRY: Move SummaryOutput, SummarizationError, SummarizerConfig, and generate_summary to _utils.py - eliminates duplicate code between adaptive.py and map_reduce.py 2. Config consolidation: Remove MapReduceConfig, use SummarizerConfig throughout. map_reduce.py now accepts SummarizerConfig directly. 3. Document redundant check: The token_max check in map_reduce_summarize is kept as a safety guard for direct calls, with clear documentation explaining it's normally handled by adaptive.py.
- Remove _summarize_text function with hardcoded prompt (use centralized prompts in _prompts.py via adaptive.py instead) - Remove redundant token_max safety guard from map_reduce_summarize - Update docstring to clarify function is designed for content exceeding token_max, directing users to adaptive.summarize() for proper routing
MapReduceSummarizationError already inherits from SummarizationError, so catching and re-raising serves no purpose.
- Remove empty content check in map_reduce_summarize (caller validates) - Remove 'if summary else 0' guards (generate_summary never returns None) - Remove 'if input_tokens > 0' guards (input is guaranteed non-empty) - Remove 'if summaries else ""' guard (summaries always has content)
…ck test Compares old L1-L4 hierarchical vs new adaptive map-reduce approach: - Shows which level each system would use - Runs new summarizer and measures fact preservation - Uses specific 'needle' facts embedded in test content
…tation - Remove references to old L1-L4/STANDARD/DETAILED/HIERARCHICAL levels - Remove HierarchicalSummary and ChunkSummary (no longer exist) - Update storage format to show single summary entry - Add new section on limitations and trade-offs - Simplify error handling section - Add data models section with current code
Remove outdated references to 5-level hierarchy (STANDARD, DETAILED, HIERARCHICAL) and L1/L2/L3 storage structure. Update to reflect current 3-level system (NONE, BRIEF, MAP_REDUCE) with single final summary. Also fix prompt names to match actual implementation: - BRIEF_SUMMARY_PROMPT, STANDARD_SUMMARY_PROMPT - CHUNK_SUMMARY_PROMPT, META_SUMMARY_PROMPT - Remove non-existent ROLLING_PROMPT
…RY_PROMPT The prompt name "STANDARD" was a leftover from the old 5-level system which had a STANDARD SummaryLevel. Since that level no longer exists (now just NONE, BRIEF, MAP_REDUCE), rename to GENERAL_SUMMARY_PROMPT to match its actual purpose as the "general" content type prompt.
- Remove unused `middle_truncate()` function and its tests - Remove unused `MapReduceSummarizationError` exception class - Move `SummarizerConfig` and `SummarizationError` from _utils.py to models.py This groups all exported types in models.py and keeps _utils.py focused on actual utility functions (token counting, chunking, LLM calls). Net: -96 lines
…cases The output_validator already ensures MemoryUpdate and MemoryDelete IDs are valid. Defensive handling of unknown IDs obscures the contract and could hide real bugs. Now uses direct indexing which will raise KeyError if the validator ever fails.
…rameters Remove SummaryLevel enum and three-level strategy in favor of a simple "fits target? return as-is : map-reduce" approach. This reduces complexity while maintaining full functionality. Changes: - Remove SummaryLevel enum (NONE/BRIEF/MAP_REDUCE) - Add target_tokens parameter for absolute token limit - Add target_ratio parameter for relative compression (e.g., 0.2 = 20%) - Simplify estimate_summary_tokens to use ~10% compression ratio - Update memory integration to use compression_ratio in logging - Rewrite examples and tests for new API - Update architecture documentation Net reduction: ~165 lines of code
- Remove unused BRIEF_SUMMARY_PROMPT (brief level was removed) - Remove unused timeout field from SummarizerConfig - Update tests and examples accordingly
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.