Middleware

Middleware lets you plug into the agent’s execution loop at every step. You can inspect and modify messages before they go to the LLM, rewrite tool inputs, redact tool outputs, swap providers on failure, or collect metrics — all without changing your agent’s core logic.

Middleware methods

A middleware class can implement any combination of these methods. All are optional.

Method	When it runs	What you can change
`before_agent(hook_input)`	Once, before the first LLM call	Initial history, run-scoped state
`after_agent(hook_input)`	Once, after execution completes	Final agent response
`before_model(hook_input)`	Before each LLM call	Message list sent to the LLM
`after_model(hook_input)`	After each LLM call	Parsed response, tool calls
`before_tool(hook_input)`	Before each tool execution	Tool input arguments
`after_tool(hook_input)`	After each tool execution	Tool output fed back to the LLM
`wrap_model_call(params, call_next)`	Around the raw LLM invocation	Provider, retry logic, caching
`wrap_tool_call(params, call_next)`	Around each tool execution	Execution environment, timeouts

Execution order is deterministic:

before_* methods run first → last (in the order middlewares are registered)
after_* methods run last → first (reverse order)
wrap_* methods are nested — the first middleware in your list is the outermost wrapper

Writing a custom middleware

Import Middleware and HookResult from nexau.archs.main_sub.execution.hooks, then implement the methods you need.

from nexau.archs.main_sub.execution.hooks import HookResult, Middleware

class AuditMiddleware(Middleware):
    def after_model(self, hook_input):
        print("Model emitted", len(hook_input.parsed_response.tool_calls or []), "tool calls")
        return HookResult.no_changes()

    def after_tool(self, hook_input):
        print("Tool", hook_input.tool_name, "returned", hook_input.tool_output)
        return HookResult.no_changes()

    def wrap_model_call(self, params, call_next):
        print("Calling LLM with", len(params.messages), "messages")
        return call_next(params)

Every method must return a HookResult.

HookResult

HookResult makes your intent explicit so later middlewares can build on your changes safely.

HookResult.no_changes() — signal that this middleware made no modifications. The loop continues with the existing state.
HookResult.with_modifications(**kwargs) — pass modified values back to the loop. Only the fields you provide are updated; everything else is left as-is.

Common fields you can pass to with_modifications:

# Rewrite the messages before the next LLM call
HookResult.with_modifications(messages=updated_messages)

# Add or remove tool calls from the parsed response
HookResult.with_modifications(parsed_response=updated_response)

# Modify tool arguments before the tool runs
HookResult.with_modifications(tool_input={"timeout": 30, **hook_input.tool_input})

# Reshape a tool's output before it goes back into the conversation
HookResult.with_modifications(tool_output={"text": hook_input.tool_output})

# Override the agent's final reply
HookResult.with_modifications(agent_response="[redacted]")

Here are a few more examples showing what you can do with each method:

from nexau.archs.main_sub.execution.hooks import HookResult, Middleware

class PrefixMiddleware(Middleware):
    def before_model(self, hook_input):
        updated = hook_input.messages + [{
            "role": "system",
            "content": "Reminder: stay within budget.",
        }]
        return HookResult.with_modifications(messages=updated)

class ToolFilter(Middleware):
    def after_model(self, hook_input):
        parsed = hook_input.parsed_response
        if not parsed:
            return HookResult.no_changes()
        parsed.tool_calls = [call for call in parsed.tool_calls if call.tool_name != "system_command"]
        return HookResult.with_modifications(parsed_response=parsed)

class ClampInputMiddleware(Middleware):
    def before_tool(self, hook_input):
        updated = dict(hook_input.tool_input)
        updated.setdefault("timeout", 30)
        return HookResult.with_modifications(tool_input=updated)

Built-in middleware

NexAU ships four middleware you can drop in without writing any code.

LoggingMiddleware

Logs model calls and tool executions. It supports both structured after-model/after-tool logging and wrapping the raw model call to trace streaming generators.

YAML
Python

middlewares:
  - import: nexau.archs.main_sub.execution.hooks:LoggingMiddleware

from nexau.archs.main_sub.execution.hooks import LoggingMiddleware

agent = Agent(
    config=AgentConfig(
        ...,
        middlewares=[LoggingMiddleware()],
    )
)

LLMFailoverMiddleware

Automatically retries with backup LLM providers when the primary returns matching errors. Supports multi-level fallback chains and an optional circuit breaker.

YAML
Python

middlewares:
  - import: nexau.archs.main_sub.execution.middleware.llm_failover:LLMFailoverMiddleware
    params:
      trigger:
        status_codes: [500, 502, 503, 529]
        exception_types: ["RateLimitError", "InternalServerError"]
      fallback_providers:
        - name: "backup-gateway"
          llm_config:
            base_url: "https://backup.example.com/v1"
            api_key: "sk-backup-xxx"
        - name: "emergency"
          llm_config:
            model: "gpt-4o"
            base_url: "https://emergency.example.com/v1"
            api_key: "sk-emergency-xxx"
            api_type: "openai_chat_completion"
      circuit_breaker:
        failure_threshold: 3
        recovery_timeout_seconds: 60

from nexau.archs.main_sub.execution.middleware.llm_failover import LLMFailoverMiddleware

failover = LLMFailoverMiddleware(
    trigger={"status_codes": [500, 502, 503], "exception_types": ["RateLimitError"]},
    fallback_providers=[
        {"name": "backup", "llm_config": {"base_url": "https://backup.example.com/v1", "api_key": "sk-xxx"}},
    ],
    circuit_breaker={"failure_threshold": 3, "recovery_timeout_seconds": 60},
)

How it works:

The primary call runs via call_next(params). If it succeeds, the result is returned immediately.
On failure, the middleware checks whether the exception matches trigger.status_codes or trigger.exception_types.
If it matches, fallback providers are tried in order. Each creates a new ModelCallParams — the original config is never mutated.
If all providers fail, the last exception is raised.
The circuit breaker (optional) skips the primary for recovery_timeout_seconds after failure_threshold consecutive failures.

Fallback llm_config fields are merged on top of the primary config. Fields you don’t specify in a fallback are inherited from the primary, including the model.

LongToolOutputMiddleware

When a tool returns output larger than a configurable character threshold, this middleware truncates it, saves the full content to a temporary file, and replaces the tool output with the truncated version plus a hint pointing to the file. The agent can then call a file-reading tool if it needs the full content.

YAML
Python

middlewares:
  - import: nexau.archs.main_sub.execution.middleware.long_tool_output:LongToolOutputMiddleware
    params:
      max_output_chars: 10000
      head_lines: 50
      tail_lines: 30
      temp_dir: /tmp/nexau_tool_outputs
      bypass_tool_names:
        - execute_bash

from nexau.archs.main_sub.execution.middleware.long_tool_output import LongToolOutputMiddleware

middleware = LongToolOutputMiddleware(
    max_output_chars=10000,
    head_lines=50,
    tail_lines=30,
    temp_dir="/tmp/nexau_tool_outputs",
    bypass_tool_names=["execute_bash"],
)

Parameter	Default	Description
`max_output_chars`	`10000`	Character count that triggers truncation
`head_lines`	`50`	Lines to keep from the start of the output
`tail_lines`	`30`	Lines to keep from the end of the output
`temp_dir`	`"/tmp/nexau_tool_outputs"`	Where full outputs are saved. Set to `null` to truncate without saving a file.
`bypass_tool_names`	`None`	Tools whose output is never truncated (e.g. tools that already truncate themselves)

ContextCompactionMiddleware

Manages conversation context when token limits are approached. See Context compaction for full configuration options.

Wiring middlewares to an agent

YAML
Python

type: agent
name: my_agent
llm_config:
  model: gpt-4o-mini
middlewares:
  - import: my_project.middleware:AuditMiddleware
    params:
      log_file: "/tmp/audit.log"
  - import: nexau.archs.main_sub.execution.middleware.logging:LoggingMiddleware

from nexau import Agent, AgentConfig
from my_project.middleware import AuditMiddleware
from nexau.archs.main_sub.execution.hooks import LoggingMiddleware

agent = Agent(
    config=AgentConfig(
        name="my_agent",
        llm_config={"model": "gpt-4o-mini"},
        middlewares=[
            AuditMiddleware(log_file="/tmp/audit.log"),
            LoggingMiddleware(),
        ],
    )
)

You can mix built-in and custom middleware freely. The execution order guarantee described above applies regardless of where each middleware comes from.

Get Started

Core Concepts

Guides

Advanced

Middleware methods

Writing a custom middleware

HookResult

Built-in middleware

LoggingMiddleware

LLMFailoverMiddleware

LongToolOutputMiddleware

ContextCompactionMiddleware

Wiring middlewares to an agent

Get Started

Core Concepts

Guides

Advanced

​Middleware methods

​Writing a custom middleware

​HookResult

​Built-in middleware

​LoggingMiddleware

​LLMFailoverMiddleware

​LongToolOutputMiddleware

​ContextCompactionMiddleware

​Wiring middlewares to an agent

Middleware methods

Writing a custom middleware

HookResult

Built-in middleware

LoggingMiddleware

LLMFailoverMiddleware

LongToolOutputMiddleware

ContextCompactionMiddleware

Wiring middlewares to an agent