feature: Context Compaction

Using claude-code, a handy feature is that when it hits it's limit, it creates a summary of the context, and replaces it. This means that you have an infinite windowed context, which would be really awesome for limited memory setups (I use GeForce RTX 4060 Ti with only 8GB RAM with ollama.)

I think a python-wrapper around ollama could do it, too, so it doesn't necessarily need to be solved here, but it could help any model/service (as it does with claude-code, for longer-running sessions.)

Some options I think would be cool:

- provider/model for summary
- size of context, before compaction, based on percentage (like 80% of total context-size)
- good default summary prompt, but allow user to modify it

Here is an example compaction prompt:

```py
summary_prompt = f"""
Create a comprehensive but concise summary of this conversation for a chat assistant.
This summary will replace the full conversation history to optimize memory usage.

Requirements:
1. Preserve all critical context and decisions
2. Maintain user preferences and constraints  
3. Include any technical details or requirements
4. Preserve any code snippets or configurations discussed
5. Note the conversation flow and key topics

Keep the summary under {summary_tokens_max} tokens but be thorough.

Conversation History:
{json.dumps(messages_to_summarize, indent=2)}

Summary:
"""
```

Is this something I should PR for? Would this be better as an extension?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Context Compaction #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feature: Context Compaction #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions