Skip to content

Conversation

@basnijholt
Copy link
Owner

No description provided.

- Add @agent.output_validator to validate LLM decisions
- Catch invalid UPDATE/DELETE/NONE with non-existent IDs
- Send helpful error messages via ModelRetry for retry
- Graceful fallback to add all facts when retries exhausted
- Add AI journal POC example for testing MemoryClient
- Improve reconciliation prompt with clearer examples
@basnijholt basnijholt changed the title WIP Several memory improvements Nov 27, 2025
Switch from tool calls to JSON mode output for the reconciliation agent.
This works better with local models (like reasoning models) that put
output in reasoning_content field instead of content.

PromptedOutput injects the schema into the prompt and enables JSON mode
(response_format={"type": "json_object"}), matching mem0's approach.
- Add list_all() method to MemoryClient to retrieve all stored memories
- Add 'show' command to display all stored facts about the user
- Add 'profile' command to generate a structured profile summary using LLM
- Enhance 'chat' command to use profile context for personalized responses

The POC now demonstrates a "self-model" system that:
1. Extracts facts from user input
2. Stores and retrieves them semantically
3. Generates profile summaries on demand
4. Uses the profile to personalize conversations

This validates the core hypothesis: MemoryClient can serve as the
foundation for a personal knowledge system that knows who you are.
Analyzes architecture, features, and test results comparing our
MemoryClient-based POC (~200 LOC) with the full aijournal project
(~15,000+ LOC).

Key findings:
- POC successfully extracts facts and generates accurate profiles
- Main gap is learning over time (strength tracking, decay, feedback)
- Recommends adding simple strength field to close 80% of functionality
  gap with 20% of aijournal's complexity

Includes concrete test results from ingesting 12+ blog posts.
)
replacement_map[orig] = new_id
else:
# UPDATE with unknown ID = treat as ADD (model used wrong event)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this even happen still?

Implement research-grounded summarization inspired by Letta and Mem0:
- AdaptiveSummarizer with 5 levels (NONE, BRIEF, STANDARD, DETAILED, HIERARCHICAL)
- Hierarchical summary storage (L1 chunks, L2 groups, L3 final) in ChromaDB
- File-based persistence with YAML front matter in markdown files
- Token counting via tiktoken with fallback to cl100k_base
- Level-specific compression ratios (20%, 12%, 7%, capped 2000 tokens)

Structure:
- agent_cli/summarizer/ - standalone reusable summarization module
- summaries/L1/chunk_*.md, L2/group_*.md, L3/final.md file hierarchy
- Soft-delete old summaries to deleted/ folder before replacing
- Fix datetime.utcnow() deprecation, use datetime.now(UTC)
- Extract duplicate chunk summarization to _summarize_single_chunk()
- Add SummarizationError exception for better error handling
- Add retry with exponential backoff (1s, 2s, 4s) for generation failures
- Add middle-truncation fallback for oversized content (Letta-style)
- Export SummarizationError from module __init__
- Remove AdaptiveSummarizer class in favor of standalone functions
- Add SummarizerConfig dataclass for configuration
- Export determine_level() as pure function (no state needed)
- Update summarize(), update_rolling_summary() to take config parameter
- Update _ingest.py to use new functional API
- Update all tests for new API

This matches the functional style used throughout the codebase,
reducing state and improving testability.
…ic API

- Rename prompts.py → _prompts.py and utils.py → _utils.py
- Reduce public API to 6 essential exports: SummarizerConfig, summarize,
  SummaryResult, SummaryLevel, HierarchicalSummary, SummarizationError
- Remove determine_level, update_rolling_summary, count_tokens from public API
- Update imports in adaptive.py and test files
Replace the old rolling summary system with the new hierarchical
adaptive summarizer. This simplifies the codebase by removing
redundant code paths and using a single, research-backed approach.

Changes:
- Update extract_and_store_facts_and_summaries() to use summarize_content()
  and store_adaptive_summary() instead of update_summary()/persist_summary()
- Remove old summary functions: update_summary, persist_summary, get_summary_entry
- Remove Summary entity and SummaryOutput model (unused)
- Add summary_level to L3 metadata for consistency
- Update tests to mock new summarizer interface

The new system automatically selects summarization level (NONE, BRIEF,
STANDARD, DETAILED, HIERARCHICAL) based on content complexity, storing
summaries in a L1/L2/L3 hierarchical structure.
…maries

- Create docs/architecture/summarizer.md with comprehensive technical
  specification for the adaptive summarization system
- Update memory.md to reflect new L1/L2/L3 hierarchical summary structure
- Document level thresholds, compression ratios, and research basis
- Add content-type aware prompts documentation
- Document integration with memory system and storage format
Removed unused code:
- update_rolling_summary() - never called anywhere
- _raw_generate() fallback - errors should fail loudly
- retry/backoff logic - same reason
- parent_group from ChunkSummary - stored but never read
- ROLLING_SUMMARY_PROMPT - only used by removed function

Kept middle_truncate() - useful for handling very large inputs
(e.g., conversations with pasted codebases).

Bugfix:
- Add {prior_context} to CONVERSATION, JOURNAL, DOCUMENT prompts
- Previously prior_summary was silently ignored for non-"general" types
- Python's .format() ignores extra kwargs, hiding the bug

Updates documentation to reflect fail-fast error handling.
Expose the full power of the summarizer through a CLI command that:
- Follows existing CLI patterns using shared opts module
- Supports all LLM providers (ollama, openai, gemini)
- Offers content-type prompts (general, conversation, journal, document)
- Provides output formats (text, json, full hierarchical)
- Includes chunking options and rolling summary support
- Reads from file or stdin
…args

- Remove unused parent_group from MemoryMetadata (was never assigned)
- Refactor write_memory_file to accept optional MemoryMetadata object
  instead of 17 individual parameters
- Simplify upsert_hierarchical_summary to use MemoryMetadata(**dict)
- Rename summary_level to summary_level_name for consistency
- Make tiktoken optional in token counting with fallback heuristic
Improve CLI startup time from ~0.51s to ~0.16s (69% faster) by deferring
heavy imports until they're actually needed:

- pydantic_ai: lazy in memory/_ingest.py, summarizer/adaptive.py, rag/engine.py
- sounddevice: lazy in core/audio.py (moved to TYPE_CHECKING + function imports)
- numpy: lazy in rag/_retriever.py and services/tts.py

Update tests to patch modules directly (e.g., pydantic_ai.Agent) instead of
through module attributes that no longer exist at import time.

Add scripts/profile_imports.py for measuring import performance.
- Extract upsert_summary_entries() to avoid double to_storage_metadata() call
- Extract _summarize_chunks() helper for async chunk processing pipeline
…ummary

- Replace verbose Args/Returns docstrings with single-line summaries
- Remove upsert_hierarchical_summary (was only used in tests)
- Update tests to use upsert_summary_entries directly

Net: -102 lines
Some models leak control tokens like <|constrain|>, <|end|>, etc.
into their output. Add regex cleanup in _generate_summary().

Also rewrites docs/architecture/summarizer.md to focus on research
foundations and design rationale rather than code snippets.
After verifying claims against actual Letta and Mem0 codebases:

Letta (verified):
- Partial eviction (30%) - `partial_evict_summarizer_percentage`
- Middle truncation - `middle_truncate_text()` function
- Fire-and-forget - `fire_and_forget()` method
- arXiv:2310.08560

Mem0 (corrected):
- Two-phase architecture (verified) - fact extraction then memory ops
- Removed "90%+ compression" claim - refers to token savings vs full
  context, not summarization compression ratios
- Removed "rolling summaries" attribution - not a Mem0 term
- arXiv:2504.19413

Also removes incorrect "based on Mem0 research" from code docstrings
where compression ratios were empirically chosen, not research-derived.
Previously, the summarizer was summarizing the already-compressed
extracted facts, which is redundant. Now it summarizes the actual
user/assistant messages, which is what makes sense for a conversation
summary.
- Document what's actually borrowed from research:
  - Two-phase architecture from Mem0 (arXiv:2504.19413)
  - Hierarchical merging concept from BOOOOKSCORE (arXiv:2310.00785)

- Clarify what Letta does differently (message count, not tokens)

- Acknowledge original/heuristic design choices:
  - Token thresholds (100/500/3000/15000) are not research-backed
  - L1/L2/L3 hierarchy structure is original
  - Chunk size (3000) is larger than BOOOOKSCORE's 2048

- Add future improvements section based on research findings
Remove old hierarchical summarization (STANDARD, DETAILED, HIERARCHICAL)
in favor of a simpler 3-level system inspired by LangChain's map-reduce:

- NONE: Skip summarization for very short content (<100 tokens)
- BRIEF: Single-pass summary for short content (100-500 tokens)
- MAP_REDUCE: LangChain-style map-reduce for longer content (500+ tokens)

Key changes:
- Add map_reduce.py with dynamic collapse algorithm
- Remove HierarchicalSummary and ChunkSummary classes
- Rename summary_level_name to summary_level in metadata
- Add collapse_depth field to track map-reduce iterations
- Use research-backed defaults (chunk_size=2048, token_max=3000)
- Update all tests for simplified API
- No backward compatibility - clean break from old implementation
Address review feedback:
1. DRY: Move SummaryOutput, SummarizationError, SummarizerConfig, and
   generate_summary to _utils.py - eliminates duplicate code between
   adaptive.py and map_reduce.py

2. Config consolidation: Remove MapReduceConfig, use SummarizerConfig
   throughout. map_reduce.py now accepts SummarizerConfig directly.

3. Document redundant check: The token_max check in map_reduce_summarize
   is kept as a safety guard for direct calls, with clear documentation
   explaining it's normally handled by adaptive.py.
- Remove _summarize_text function with hardcoded prompt (use centralized
  prompts in _prompts.py via adaptive.py instead)
- Remove redundant token_max safety guard from map_reduce_summarize
- Update docstring to clarify function is designed for content exceeding
  token_max, directing users to adaptive.summarize() for proper routing
MapReduceSummarizationError already inherits from SummarizationError,
so catching and re-raising serves no purpose.
- Remove empty content check in map_reduce_summarize (caller validates)
- Remove 'if summary else 0' guards (generate_summary never returns None)
- Remove 'if input_tokens > 0' guards (input is guaranteed non-empty)
- Remove 'if summaries else ""' guard (summaries always has content)
…ck test

Compares old L1-L4 hierarchical vs new adaptive map-reduce approach:
- Shows which level each system would use
- Runs new summarizer and measures fact preservation
- Uses specific 'needle' facts embedded in test content
…tation

- Remove references to old L1-L4/STANDARD/DETAILED/HIERARCHICAL levels
- Remove HierarchicalSummary and ChunkSummary (no longer exist)
- Update storage format to show single summary entry
- Add new section on limitations and trade-offs
- Simplify error handling section
- Add data models section with current code
Remove outdated references to 5-level hierarchy (STANDARD, DETAILED,
HIERARCHICAL) and L1/L2/L3 storage structure. Update to reflect current
3-level system (NONE, BRIEF, MAP_REDUCE) with single final summary.

Also fix prompt names to match actual implementation:
- BRIEF_SUMMARY_PROMPT, STANDARD_SUMMARY_PROMPT
- CHUNK_SUMMARY_PROMPT, META_SUMMARY_PROMPT
- Remove non-existent ROLLING_PROMPT
…RY_PROMPT

The prompt name "STANDARD" was a leftover from the old 5-level system
which had a STANDARD SummaryLevel. Since that level no longer exists
(now just NONE, BRIEF, MAP_REDUCE), rename to GENERAL_SUMMARY_PROMPT
to match its actual purpose as the "general" content type prompt.
- Remove unused `middle_truncate()` function and its tests
- Remove unused `MapReduceSummarizationError` exception class
- Move `SummarizerConfig` and `SummarizationError` from _utils.py to models.py

This groups all exported types in models.py and keeps _utils.py focused
on actual utility functions (token counting, chunking, LLM calls).

Net: -96 lines
…cases

The output_validator already ensures MemoryUpdate and MemoryDelete IDs
are valid. Defensive handling of unknown IDs obscures the contract and
could hide real bugs. Now uses direct indexing which will raise KeyError
if the validator ever fails.
@basnijholt basnijholt changed the title Several memory improvements Map reduce sumarization Dec 4, 2025
@basnijholt basnijholt changed the title Map reduce sumarization Recursive summarization. Dec 4, 2025
@basnijholt basnijholt changed the title Recursive summarization. Recursive summarization Dec 4, 2025
@basnijholt basnijholt changed the title Recursive summarization Adaptive Summarization System + Memory Pipeline Integration Dec 4, 2025
…rameters

Remove SummaryLevel enum and three-level strategy in favor of a simple
"fits target? return as-is : map-reduce" approach. This reduces complexity
while maintaining full functionality.

Changes:
- Remove SummaryLevel enum (NONE/BRIEF/MAP_REDUCE)
- Add target_tokens parameter for absolute token limit
- Add target_ratio parameter for relative compression (e.g., 0.2 = 20%)
- Simplify estimate_summary_tokens to use ~10% compression ratio
- Update memory integration to use compression_ratio in logging
- Rewrite examples and tests for new API
- Update architecture documentation

Net reduction: ~165 lines of code
- Remove unused BRIEF_SUMMARY_PROMPT (brief level was removed)
- Remove unused timeout field from SummarizerConfig
- Update tests and examples accordingly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants