Skip to main content
ContextCompactionMiddleware monitors token usage after each LLM call and compacts the conversation history before your provider returns a context-overflow error. Once configured, it runs automatically — no changes needed to your tools or agent logic.

Key concepts

UserRound — one complete cycle from a user message to the final model response. A UserRound contains one or more iterations. Iteration — one round bounded by an ASSISTANT message:
[USER or FRAMEWORK] (optional) → [ASSISTANT] → [TOOL results] (optional)

Compaction strategies

NexAU ships two strategies. Choose based on your cost and quality requirements.
Replaces old tool result content with a placeholder string. Fast and costs nothing — no extra LLM call.What it keeps: system prompt, all user messages, all assistant messages, and tool results in the most recent N iterations.What it compacts: older tool results are replaced with "Tool call result has been compacted".
middlewares:
  - import: nexau.archs.main_sub.execution.middleware.context_compaction:ContextCompactionMiddleware
    params:
      max_context_tokens: 100000
      auto_compact: true
      threshold: 0.75
      compaction_strategy: "tool_result_compaction"
      keep_iterations: 1   # iterations to leave uncompacted
ContextCompactionMiddleware(
    max_context_tokens=100000,
    auto_compact=True,
    threshold=0.75,
    compaction_strategy="tool_result_compaction",
    keep_iterations=1,
)

Key parameters

ParameterDefaultDescription
max_context_tokensMaximum tokens allowed in the context window
threshold0.75Fraction of max_context_tokens that triggers compaction
auto_compacttrueEnable automatic monitoring and compaction
compaction_strategy"tool_result_compaction" or "llm_summary"
keep_iterations1Number of recent iterations to leave uncompacted
summary_llm_configOverride the LLM used for summarization (llm_summary only)
compact_prompt_pathPath to a custom summary prompt file (llm_summary only)
Compaction is automatically skipped if the last assistant message has no tool calls, since there is nothing to compact in that state.

Emergency overflow fallback

When emergency_compact_enabled: true and your provider returns a context-overflow error, the middleware falls back to an aggressive emergency compaction:
  1. Keeps the system message, last iteration, any unresolved tool-use chain, and the last user message unchanged.
  2. Splits the remaining history into two equal segments by token count.
  3. Summarizes both segments.
  4. Merges the two summaries into one compact context.
  5. Rebuilds messages as system + merged summary + safety region.
  6. If the result still exceeds the limit, it fails fast instead of retrying.
middlewares:
  - import: nexau.archs.main_sub.execution.middleware.context_compaction:ContextCompactionMiddleware
    params:
      max_context_tokens: 100000
      auto_compact: true
      threshold: 0.75
      compaction_strategy: "llm_summary"
      keep_iterations: 2
      emergency_compact_enabled: true