Back to work
2026-04-1911 min readin-progress

Multi-Agent Orchestrator

A LangGraph-powered orchestrator that coordinates specialised research, coding, and review agents with shared state, persistent tool-use, and human-in-the-loop checkpoints.

LangGraph
Claude Agent SDK
Python
Redis
Next.js
Postgres
Multi-Agent Orchestrator

Specialised agents

5

Ship rate

Human checkpoints

configurable

State store

Redis + PG

Why multi-agent

For small tasks, single-shot Claude is usually the best choice — simpler, cheaper, faster. But for work that spans research, planning, coding, review, and verification — and that benefits from separation of concerns — a single monolithic prompt rots fast.

This orchestrator treats each concern as a specialised agent with its own prompt, its own tools, and its own success rubric. A top-level planner assigns work, a shared state store keeps everyone honest, and a reviewer agent rejects output that doesn't meet spec.

The five agents

  • Planner — decomposes a user goal into a typed DAG of tasks; chooses which specialist handles each node
  • Researcher — spec fetching, code archaeology, dependency checking; read-only tools
  • Engineer — writes and edits code; has scoped write access and can run tests
  • Reviewer — reads the diff, runs the review checklist, accepts or rejects with specific feedback
  • Integrator — handles the git/CI dance: branch, commit, PR, status checks

Shared state is persisted in Redis (hot) + Postgres (durable). Every agent message is content-addressed and replayable — the whole session can be re-run deterministically from a checkpoint.

Built on LangGraph + Claude Agent SDK

  1. Core

    LangGraph state machine

    Nodes are agents; edges are task handoffs. Conditional edges route based on reviewer verdicts (accept / revise / escalate).

  2. Agents

    Claude Agent SDK runtimes

    Each agent is a Claude Agent SDK process with its own tool permissions. Ephemeral, so one compromised agent can't affect another.

  3. Tools

    Typed tool contracts

    All tools expose Zod schemas. The orchestrator validates inputs and outputs at the state-machine boundary — bad shapes never hit the tool.

  4. Humans

    Configurable checkpoints

    Every destructive action can gate on human approval. Approvals are signed and stored in the audit log with the agent's reasoning.

Early numbers

On a real refactor task (cross-file Go change, tests, and a migration), the orchestrator finished in ~28 minutes end-to-end. A baseline single-agent prompt with the same tools finished in ~92 minutes and needed two retry prompts to fix a missed edge case — roughly 3× slower.

The reviewer agent caught 7 defects the engineer would have shipped. The integrator cleanly handled the PR dance without hallucinating branch names.

Open questions

  • How do you evaluate a multi-agent system rigorously? Single-agent evals extend poorly. Working on a harness based on end-state correctness + cost.
  • When does the overhead of multi-agent not pay off? Leaning toward: small tasks (< 3 files, no external research) should stay single-agent.
  • Can the planner learn from past runs? Current version is prompt-only. Exploring a lightweight bandit over task decompositions.

What's next

  • Add a test-runner and a security-reviewer agent
  • Open-source the orchestrator skeleton with a minimal demo
  • Post a deep-dive on the state-machine design choices