Context Compaction

ContextCompactionMiddleware monitors token usage after each LLM call and compacts the conversation history before your provider returns a context-overflow error. Once configured, it runs automatically — no changes needed to your tools or agent logic.

Key concepts

UserRound — one complete cycle from a user message to the final model response. A UserRound contains one or more iterations. Iteration — one round bounded by an ASSISTANT message:

[USER or FRAMEWORK] (optional) → [ASSISTANT] → [TOOL results] (optional)

Compaction strategies

NexAU ships two strategies. Choose based on your cost and quality requirements.

tool_result_compaction
llm_summary

Replaces old tool result content with a placeholder string. Fast and costs nothing — no extra LLM call.What it keeps: system prompt, all user messages, all assistant messages, and tool results in the most recent N iterations.What it compacts: older tool results are replaced with "Tool call result has been compacted".

middlewares:
  - import: nexau.archs.main_sub.execution.middleware.context_compaction:ContextCompactionMiddleware
    params:
      max_context_tokens: 100000
      auto_compact: true
      threshold: 0.75
      compaction_strategy: "tool_result_compaction"
      keep_iterations: 1   # iterations to leave uncompacted

ContextCompactionMiddleware(
    max_context_tokens=100000,
    auto_compact=True,
    threshold=0.75,
    compaction_strategy="tool_result_compaction",
    keep_iterations=1,
)

Summarizes older conversation rounds using an LLM call, then injects the summary into the context. Preserves semantic meaning at the cost of an additional model call.

middlewares:
  - import: nexau.archs.main_sub.execution.middleware.context_compaction:ContextCompactionMiddleware
    params:
      max_context_tokens: 100000
      auto_compact: true
      threshold: 0.75
      compaction_strategy: "llm_summary"
      keep_iterations: 2

      # Optional: use a separate model for summarization.
      # If omitted, the agent's own llm_config is reused.
      summary_llm_config:
        model: ${env.SUMMARY_MODEL}
        base_url: ${env.SUMMARY_BASE_URL}
        api_key: ${env.SUMMARY_API_KEY}
        api_type: ${env.SUMMARY_API_TYPE}

      # Optional: custom summary prompt
      compact_prompt_path: "./prompts/custom_compact_prompt.md"

ContextCompactionMiddleware(
    max_context_tokens=100000,
    auto_compact=True,
    threshold=0.75,
    compaction_strategy="llm_summary",
    keep_iterations=2,
    summary_llm_config={
        "model": "nex-n1",
        "base_url": "https://summary.example.com/v1",
        "api_key": "sk-...",
        "api_type": "openai_chat_completion",
    },
)

Key parameters

Parameter	Default	Description
`max_context_tokens`	—	Maximum tokens allowed in the context window
`threshold`	`0.75`	Fraction of `max_context_tokens` that triggers compaction
`auto_compact`	`true`	Enable automatic monitoring and compaction
`compaction_strategy`	—	`"tool_result_compaction"` or `"llm_summary"`
`keep_iterations`	`1`	Number of recent iterations to leave uncompacted
`summary_llm_config`	—	Override the LLM used for summarization (`llm_summary` only)
`compact_prompt_path`	—	Path to a custom summary prompt file (`llm_summary` only)

Compaction is automatically skipped if the last assistant message has no tool calls, since there is nothing to compact in that state.

Emergency overflow fallback

When emergency_compact_enabled: true and your provider returns a context-overflow error, the middleware falls back to an aggressive emergency compaction:

Keeps the system message, last iteration, any unresolved tool-use chain, and the last user message unchanged.
Splits the remaining history into two equal segments by token count.
Summarizes both segments.
Merges the two summaries into one compact context.
Rebuilds messages as system + merged summary + safety region.
If the result still exceeds the limit, it fails fast instead of retrying.

middlewares:
  - import: nexau.archs.main_sub.execution.middleware.context_compaction:ContextCompactionMiddleware
    params:
      max_context_tokens: 100000
      auto_compact: true
      threshold: 0.75
      compaction_strategy: "llm_summary"
      keep_iterations: 2
      emergency_compact_enabled: true

Get Started

Core Concepts

Guides

Advanced

Context Compaction

Key concepts

Compaction strategies

Key parameters

Emergency overflow fallback

Get Started

Core Concepts

Guides

Advanced

​Key concepts

​Compaction strategies

​Key parameters

​Emergency overflow fallback

Key concepts

Compaction strategies

Key parameters

Emergency overflow fallback