Middleware lets you plug into the agent’s execution loop at every step. You can inspect and modify messages before they go to the LLM, rewrite tool inputs, redact tool outputs, swap providers on failure, or collect metrics — all without changing your agent’s core logic.
Middleware methods
A middleware class can implement any combination of these methods. All are optional.
| Method | When it runs | What you can change |
|---|
before_agent(hook_input) | Once, before the first LLM call | Initial history, run-scoped state |
after_agent(hook_input) | Once, after execution completes | Final agent response |
before_model(hook_input) | Before each LLM call | Message list sent to the LLM |
after_model(hook_input) | After each LLM call | Parsed response, tool calls |
before_tool(hook_input) | Before each tool execution | Tool input arguments |
after_tool(hook_input) | After each tool execution | Tool output fed back to the LLM |
wrap_model_call(params, call_next) | Around the raw LLM invocation | Provider, retry logic, caching |
wrap_tool_call(params, call_next) | Around each tool execution | Execution environment, timeouts |
Execution order is deterministic:
before_* methods run first → last (in the order middlewares are registered)
after_* methods run last → first (reverse order)
wrap_* methods are nested — the first middleware in your list is the outermost wrapper
Writing a custom middleware
Import Middleware and HookResult from nexau.archs.main_sub.execution.hooks, then implement the methods you need.
from nexau.archs.main_sub.execution.hooks import HookResult, Middleware
class AuditMiddleware(Middleware):
def after_model(self, hook_input):
print("Model emitted", len(hook_input.parsed_response.tool_calls or []), "tool calls")
return HookResult.no_changes()
def after_tool(self, hook_input):
print("Tool", hook_input.tool_name, "returned", hook_input.tool_output)
return HookResult.no_changes()
def wrap_model_call(self, params, call_next):
print("Calling LLM with", len(params.messages), "messages")
return call_next(params)
Every method must return a HookResult.
HookResult
HookResult makes your intent explicit so later middlewares can build on your changes safely.
HookResult.no_changes() — signal that this middleware made no modifications. The loop continues with the existing state.
HookResult.with_modifications(**kwargs) — pass modified values back to the loop. Only the fields you provide are updated; everything else is left as-is.
Common fields you can pass to with_modifications:
# Rewrite the messages before the next LLM call
HookResult.with_modifications(messages=updated_messages)
# Add or remove tool calls from the parsed response
HookResult.with_modifications(parsed_response=updated_response)
# Modify tool arguments before the tool runs
HookResult.with_modifications(tool_input={"timeout": 30, **hook_input.tool_input})
# Reshape a tool's output before it goes back into the conversation
HookResult.with_modifications(tool_output={"text": hook_input.tool_output})
# Override the agent's final reply
HookResult.with_modifications(agent_response="[redacted]")
Here are a few more examples showing what you can do with each method:
from nexau.archs.main_sub.execution.hooks import HookResult, Middleware
class PrefixMiddleware(Middleware):
def before_model(self, hook_input):
updated = hook_input.messages + [{
"role": "system",
"content": "Reminder: stay within budget.",
}]
return HookResult.with_modifications(messages=updated)
class ToolFilter(Middleware):
def after_model(self, hook_input):
parsed = hook_input.parsed_response
if not parsed:
return HookResult.no_changes()
parsed.tool_calls = [call for call in parsed.tool_calls if call.tool_name != "system_command"]
return HookResult.with_modifications(parsed_response=parsed)
class ClampInputMiddleware(Middleware):
def before_tool(self, hook_input):
updated = dict(hook_input.tool_input)
updated.setdefault("timeout", 30)
return HookResult.with_modifications(tool_input=updated)
Built-in middleware
NexAU ships four middleware you can drop in without writing any code.
LoggingMiddleware
Logs model calls and tool executions. It supports both structured after-model/after-tool logging and wrapping the raw model call to trace streaming generators.
middlewares:
- import: nexau.archs.main_sub.execution.hooks:LoggingMiddleware
from nexau.archs.main_sub.execution.hooks import LoggingMiddleware
agent = Agent(
config=AgentConfig(
...,
middlewares=[LoggingMiddleware()],
)
)
LLMFailoverMiddleware
Automatically retries with backup LLM providers when the primary returns matching errors. Supports multi-level fallback chains and an optional circuit breaker.
middlewares:
- import: nexau.archs.main_sub.execution.middleware.llm_failover:LLMFailoverMiddleware
params:
trigger:
status_codes: [500, 502, 503, 529]
exception_types: ["RateLimitError", "InternalServerError"]
fallback_providers:
- name: "backup-gateway"
llm_config:
base_url: "https://backup.example.com/v1"
api_key: "sk-backup-xxx"
- name: "emergency"
llm_config:
model: "gpt-4o"
base_url: "https://emergency.example.com/v1"
api_key: "sk-emergency-xxx"
api_type: "openai_chat_completion"
circuit_breaker:
failure_threshold: 3
recovery_timeout_seconds: 60
from nexau.archs.main_sub.execution.middleware.llm_failover import LLMFailoverMiddleware
failover = LLMFailoverMiddleware(
trigger={"status_codes": [500, 502, 503], "exception_types": ["RateLimitError"]},
fallback_providers=[
{"name": "backup", "llm_config": {"base_url": "https://backup.example.com/v1", "api_key": "sk-xxx"}},
],
circuit_breaker={"failure_threshold": 3, "recovery_timeout_seconds": 60},
)
How it works:
- The primary call runs via
call_next(params). If it succeeds, the result is returned immediately.
- On failure, the middleware checks whether the exception matches
trigger.status_codes or trigger.exception_types.
- If it matches, fallback providers are tried in order. Each creates a new
ModelCallParams — the original config is never mutated.
- If all providers fail, the last exception is raised.
- The circuit breaker (optional) skips the primary for
recovery_timeout_seconds after failure_threshold consecutive failures.
Fallback llm_config fields are merged on top of the primary config. Fields you don’t specify in a fallback are inherited from the primary, including the model.
When a tool returns output larger than a configurable character threshold, this middleware truncates it, saves the full content to a temporary file, and replaces the tool output with the truncated version plus a hint pointing to the file. The agent can then call a file-reading tool if it needs the full content.
middlewares:
- import: nexau.archs.main_sub.execution.middleware.long_tool_output:LongToolOutputMiddleware
params:
max_output_chars: 10000
head_lines: 50
tail_lines: 30
temp_dir: /tmp/nexau_tool_outputs
bypass_tool_names:
- execute_bash
from nexau.archs.main_sub.execution.middleware.long_tool_output import LongToolOutputMiddleware
middleware = LongToolOutputMiddleware(
max_output_chars=10000,
head_lines=50,
tail_lines=30,
temp_dir="/tmp/nexau_tool_outputs",
bypass_tool_names=["execute_bash"],
)
| Parameter | Default | Description |
|---|
max_output_chars | 10000 | Character count that triggers truncation |
head_lines | 50 | Lines to keep from the start of the output |
tail_lines | 30 | Lines to keep from the end of the output |
temp_dir | "/tmp/nexau_tool_outputs" | Where full outputs are saved. Set to null to truncate without saving a file. |
bypass_tool_names | None | Tools whose output is never truncated (e.g. tools that already truncate themselves) |
ContextCompactionMiddleware
Manages conversation context when token limits are approached. See Context compaction for full configuration options.
Wiring middlewares to an agent
type: agent
name: my_agent
llm_config:
model: gpt-4o-mini
middlewares:
- import: my_project.middleware:AuditMiddleware
params:
log_file: "/tmp/audit.log"
- import: nexau.archs.main_sub.execution.middleware.logging:LoggingMiddleware
from nexau import Agent, AgentConfig
from my_project.middleware import AuditMiddleware
from nexau.archs.main_sub.execution.hooks import LoggingMiddleware
agent = Agent(
config=AgentConfig(
name="my_agent",
llm_config={"model": "gpt-4o-mini"},
middlewares=[
AuditMiddleware(log_file="/tmp/audit.log"),
LoggingMiddleware(),
],
)
)
You can mix built-in and custom middleware freely. The execution order guarantee described above applies regardless of where each middleware comes from.