Skip to content

feature: Context Compaction #30

@konsumer

Description

@konsumer

Using claude-code, a handy feature is that when it hits it's limit, it creates a summary of the context, and replaces it. This means that you have an infinite windowed context, which would be really awesome for limited memory setups (I use GeForce RTX 4060 Ti with only 8GB RAM with ollama.)

I think a python-wrapper around ollama could do it, too, so it doesn't necessarily need to be solved here, but it could help any model/service (as it does with claude-code, for longer-running sessions.)

Some options I think would be cool:

  • provider/model for summary
  • size of context, before compaction, based on percentage (like 80% of total context-size)
  • good default summary prompt, but allow user to modify it

Here is an example compaction prompt:

summary_prompt = f"""
Create a comprehensive but concise summary of this conversation for a chat assistant.
This summary will replace the full conversation history to optimize memory usage.

Requirements:
1. Preserve all critical context and decisions
2. Maintain user preferences and constraints  
3. Include any technical details or requirements
4. Preserve any code snippets or configurations discussed
5. Note the conversation flow and key topics

Keep the summary under {summary_tokens_max} tokens but be thorough.

Conversation History:
{json.dumps(messages_to_summarize, indent=2)}

Summary:
"""

Is this something I should PR for? Would this be better as an extension?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions