Stop building “multi-agent orchestration”; build workflow contracts
Most “multi-agent orchestration” is just a demo loop with vibes. Real systems treat agents like role-bound workers behind workflow contracts: strict inputs/outputs, deterministic handoffs, and failure semantics you can test.
I keep seeing the same failure mode: you demo a “three-agent” workflow that looks great in a notebook, then it dies in prod because nobody can answer one boring question—what exactly did Agent B receive, and what did it promise back?
The orchestration layer is where systems go to become folklore. People add another agent, another prompt, another “manager” call, and the behavior gets harder to reason about. The number of agents is not the problem. The missing piece is the workflow contract.
A workflow contract is the part you can test without invoking the model. It’s the shape of the job, the handoff semantics, and the failure semantics. Everything else is optional.
The demo problem: agents are cheap, contracts aren’t
Toy multi-agent demos usually share a pattern:
- The “manager” writes a plan in free text.
- Workers read that text and do tool calls.
- The manager stitches results back together.
- When it breaks, you re-run the demo and hope the model behaves.
That’s not a system. It’s a cinematic loop.
In a real system, the orchestration layer has to survive:
- tool failures (timeouts, partial results, malformed tool outputs)
- model variability (different formats, missing fields, refusal paths)
- concurrency (same workflow running twice, out-of-order events)
- cost controls (budget exceeded mid-run)
If your “contract” is “LLM will probably format it right,” you don’t have orchestration. You have wishful execution.
The correct metric is: can you replay a workflow run and get the same state transitions given the same tool outcomes? If the answer is “not really,” you’re building a demo generator.
What a workflow contract actually includes
Treat agents like role-bound workers. The workflow contract is the boundary between the workflow engine and the model.
Concretely, a contract has to define:
1) Inputs and outputs (schemas, not prose)
- Required fields
- Optional fields
- Validation rules
- Canonical formats (dates, IDs, units)
- Output invariants (“must include citations array even if empty”)
If Agent B can return confidence as text sometimes and as a number other times, you’ve already lost.
2) Role constraints
A role is not a prompt. It’s a constraint on allowed actions.
- Agent X may only call tools A/B/C
- Agent Y may only transform data, not fetch external facts
- No hidden “manager” behavior inside workers
You don’t need perfect autonomy. You need predictable capability boundaries.
3) Handoff semantics (what “done” means)
Define the transitions:
- When does Agent A hand off? (on
status=ready, not “when it feels done”) - How do you represent “needs clarification”? (structured
clarification_request) - What does “success” look like? (required fields present + validation pass)
Handoffs should be deterministic: given the same input state, the next state is the same.
4) State model
A workflow engine needs a state machine, not a narrative.
pending -> running -> succeeded | failed | needs_input | needs_tool_retry- Correlation IDs
- Versioning for contract changes
5) Observability hooks
Contracts should include traceable artifacts:
- tool call logs (inputs/outputs)
- model prompt/version identifiers
- validation errors
- redaction rules
If you can’t see where the contract broke, you’ll “fix the prompt” forever.
Determinism where possible (and where you can’t)
I’m opinionated here: orchestration should be deterministic where possible.
That means:
- Routing rules are explicit (no “manager decides” without a rule)
- Tool calls are bounded (timeouts, budgets, max calls)
- Output formats are enforced (schema validation + repair loop)
- Plans are either structured or replayable
But you still have to deal with nondeterminism.
So separate it:
- Deterministic parts: state transitions, tool selection, retry policy, compensation
- Nondeterministic parts: text generation, classification, summarization
Then wrap nondeterminism with contracts.
Example pattern:
- Worker returns structured output with a
decisionenum. - Engine validates.
- If invalid: run a repair step that only fixes formatting, not meaning.
- If tool failed: engine follows a deterministic retry/compensation path.
You’re not trying to make the model deterministic. You’re making the system deterministic around the model.
Failure modes are the product
If your workflow contract doesn’t define failure modes, you’re outsourcing reliability to the runtime.
A real contract includes:
- Retry semantics: which errors are retryable, with what backoff
- Timeout semantics: max wall time per stage
- Partial completion: what gets committed, what gets rolled back
- Compensation: how to undo side effects
- Dead-letter paths: where “cannot recover” goes
Also: idempotency.
If the same workflow event arrives twice, you need a rule for deduping. Otherwise your “agent system” becomes a side-effect generator.
And you need failure to be legible.
Instead of “LLM failed,” you want:
validation_error: missing_field=order_idtool_error: provider_timeout tool=searchcontract_error: role_violation agent=writer tool=filesystem_write
Those are actionable. “It didn’t work” isn’t.
A practical build pattern: agent harness + relay with memory
The market signal I buy is scaffolding: harnesses and relays that make workflows provider-agnostic.
The key is not “more clever orchestration.” The key is a harness that:
- enforces contract schemas
- normalizes tool outputs across providers
- records run artifacts for replay
- exposes deterministic workflow transitions
Then add a relay layer that learns from past runs, but only through the contract.
So the relay can:
- choose routing based on past success/failure
- adjust repair strategies for specific validation errors
- surface “known bad” prompt/format combinations
Not by letting the model freestyle. By updating policy keyed on contract outcomes.
If you do it right, your multi-agent system starts to look less like a swarm and more like a workflow engine with role-bound workers.
Stop building multi-agent orchestration as a vibe layer.
Build workflow contracts, and your agents become boring—in the best way.
