ContextCompactionMiddleware monitors token usage after each LLM call and compacts the conversation history before your provider returns a context-overflow error. Once configured, it runs automatically — no changes needed to your tools or agent logic.
Key concepts
UserRound — one complete cycle from a user message to the final model response. A UserRound contains one or more iterations. Iteration — one round bounded by an ASSISTANT message:Compaction strategies
NexAU ships two strategies. Choose based on your cost and quality requirements.- tool_result_compaction
- llm_summary
Replaces old tool result content with a placeholder string. Fast and costs nothing — no extra LLM call.What it keeps: system prompt, all user messages, all assistant messages, and tool results in the most recent N iterations.What it compacts: older tool results are replaced with
"Tool call result has been compacted".Key parameters
| Parameter | Default | Description |
|---|---|---|
max_context_tokens | — | Maximum tokens allowed in the context window |
threshold | 0.75 | Fraction of max_context_tokens that triggers compaction |
auto_compact | true | Enable automatic monitoring and compaction |
compaction_strategy | — | "tool_result_compaction" or "llm_summary" |
keep_iterations | 1 | Number of recent iterations to leave uncompacted |
summary_llm_config | — | Override the LLM used for summarization (llm_summary only) |
compact_prompt_path | — | Path to a custom summary prompt file (llm_summary only) |
Compaction is automatically skipped if the last assistant message has no tool calls, since there is nothing to compact in that state.
Emergency overflow fallback
Whenemergency_compact_enabled: true and your provider returns a context-overflow error, the middleware falls back to an aggressive emergency compaction:
- Keeps the system message, last iteration, any unresolved tool-use chain, and the last user message unchanged.
- Splits the remaining history into two equal segments by token count.
- Summarizes both segments.
- Merges the two summaries into one compact context.
- Rebuilds messages as
system + merged summary + safety region. - If the result still exceeds the limit, it fails fast instead of retrying.