On March 5, 2026, the entire system was one file.
hive_cli.py. 240 lines. A Python script that sent a message to a Discord channel and waited for a reply. It was, by any serious measure, a chatbot wrapper with ambitions.
Thirty-two days later, it was something different. Ten specialized Discord bots. Thirty-nine Python tools organized into eight domain directories. A React dashboard. A dispatch pipeline with cost-aware model routing. A supervisor loop. An autonomy budget with approval gates. A knowledge layer that indexed every session. More than 200 commits.
Nobody planned any of this. It grew from a specific bottleneck, then another, then another. That’s the story worth telling: not the architecture I designed, but the path that forced the architecture on me.
The original premise was simple enough. Run Claude in a loop, point it at Discord, give it context about my work. Let it handle async tasks while I slept. The first commit was a Thursday night proof of concept. It worked in the narrow sense that the agent replied to messages and logged output to a file.
The bottleneck appeared almost immediately: Claude doesn’t know what it did yesterday. Every session started from scratch. The system had no memory, no accumulating context, no way to build on prior work. I’d dispatch a task, get a result, and then the agent forgot it happened. The work was real but the system couldn’t compound.
So on March 15 — ten days in — I added a memory layer. A memory/ directory, JSONL session logs, a state file. The commit message: “Add Phase 2 HTTP server, externalize agent prompts, and scaffold memory system.” Still one agent. Now with a notepad.
That lasted three days before the next bottleneck emerged: one agent reviewing its own work is mostly confirmation bias. You need someone who doesn’t share your assumptions.
The March 22 commit is the inflection point. The full message:
“Agent review: 37/38 improvements — monolith split, 318 new tests, ops hardening. Full agent review by all 6 roles (Sage, Kai, Nova, Vera, Atlas, Cortex) identified 130+ issues, deduplicated to 38 actionable tasks. 37 completed.”
I had bootstrapped six specialized agents — Planner, Architect, Builder, Ops, Analyst, Knowledge — and pointed them at the codebase with instructions to find problems. Not to be polite. To find problems.
They found 130.
The most damning: main.py had grown to 6,130 lines. One file. One endpoint. No separation of concerns. If you touched the session logging code, you could accidentally break the dispatch pipeline. The agents named it clearly: this is a bottleneck that will compound every future change. It needed to be split.
By end of day, main.py was 239 lines. Twelve router modules. A schemas file. The agents had reviewed the monolith into pieces.
Here’s what made this different from a normal code review. The agents had operating context that no static reviewer could have. They knew which API endpoints were called most often — and were therefore most fragile. They knew which shared state was implicitly coupled. They knew which error paths had no logging. A human reviewer would have found the obvious structural problems. The agents found the operational ones — the things that only break at runtime, under real load, in the middle of the night.
Thirty-seven of thirty-eight tasks completed in a single session. The one that didn’t was a design question that needed my input.
This was the moment the project changed. The agents weren’t just executing tasks anymore. They were reviewing the system they lived inside. The architecture stopped being something I designed and started being something we negotiated — “we” meaning me and the agents I’d built to challenge my assumptions.
Three days later, the product got a name.
The commit: “Rebrand: Hive → OpDek (product), DxDev (company), dxdev.opdek.com (domain).” It sounds minor in a git log. It wasn’t. A rebrand forces you to decide what you’re actually building. Calling it OpDek — an operations engine, not a coding assistant — clarified the product’s scope. The agents weren’t there to write code. They were there to run operations.
Naming clarifies scope. “Hive” implied a swarm of undifferentiated agents. “OpDek” implied an operations desk — something you sit at to manage work. That framing immediately resolved a product question I’d been circling: is this a tool for building agents, or a tool for running them? The answer is running them. You don’t build agents with OpDek. You operate them.
Four commits to stabilize the rebrand, then done. Every subsequent architectural decision became easier after that framing was clear.
By day 32, the count was:
- 10 Discord bots with distinct roles: CEO, CTO, CFO, CMO, COO, CKO, Planner, Analyst, Builder, Ops
- 39 Python tools in 8 domain directories: session management, decomposition, dispatch, agents, supervision, cost, knowledge, reporting
- A React dashboard with portfolio drill-down: business > milestone > task > session
- A dispatch pipeline with hard blocks (capacity, backlog, success rate) and soft warnings
- A supervisor loop running every 5 minutes, health checks every hour, daily sweeps at 3am
- A cost layer routing work to Haiku, Sonnet, or Opus based on task type and autonomy budget
- 321 plans per month autonomously executed, tracked, and summarized
The 240-line CLI had become the operations layer for a small AI-native business.
What does it feel like from the inside when an architecture evolves like this? Mostly it feels like firefighting. You don’t sit down and design a 10-agent org. You add a second agent because the first one can’t review itself. You add a supervisor because two agents disagreed and nobody was resolving it. You add cost routing because you notice 90% of tasks don’t need Opus but you’re paying Opus prices. Each addition is a direct response to a real pain.
The architecture that exists at day 32 is correct for the problems encountered in days 1–31. It’s not necessarily correct for days 33–100. That’s the part multi-agent architecture guides tend to skip: the system you need at scale doesn’t exist until you’ve accumulated the specific failures that require it.
You can’t design it in advance. You can only earn it.
Architecture
The system that exists at day 32 has four layers. Each was added in response to a specific failure mode.
Interface layer — Dashboard (React + FastAPI). Added because Discord-only meant mobile-only, and some operations need a real screen. Portfolio view: business > milestone > task > session.
Agent layer — 10 Discord bots with role specialization. The CTO catches architectural problems the Builder misses. The CFO catches cost problems the CTO doesn’t look for. One agent can’t review itself. Ten agents with different priorities can.
Execution layer — Supervisor loop + dispatch pipeline. Added because two agents can do the same work twice if there’s no coordinator. The supervisor runs every 5 minutes. Priority-ordered dispatch with dependency resolution. Stall detection: warn at 15 minutes, escalate at 30.
Data layer — SQLite (later PostgreSQL), JSONL logs, JSON dual-write. A system that can’t replay its own history can’t learn. Every session is logged. Every learning is tagged. Agents can query what previous sessions found.
Timeline
graph TD A[Day 1: hive_cli.py<br/>240 lines, 1 agent] --> B[Day 10: Memory layer<br/>JSONL logs, state file] B --> C[Day 17: Dashboard<br/>HTTP server, agent registry] C --> D[Day 18: Agent review<br/>6 roles, 130 issues found] D --> E[Day 21: Rebrand<br/>Hive → OpDek] E --> F[Day 32: Multi-agent org<br/>10 bots, 39 tools, dispatch pipeline]
style A fill:#666,stroke:#333,color:#fff style F fill:#2ecc71,stroke:#333,color:#fffProblems
The story above is the narrative. Here’s the same arc as a problem log — each bottleneck, what it cost, and how it was resolved.
| Bottleneck | Impact | Resolution |
|---|---|---|
| No memory | Every task started from zero; work couldn’t compound | JSONL session logs + memory/ directory. Immutable, append-only. Future sessions read prior context. |
| Single-reviewer blindspot | One agent reviewing itself found nothing wrong | Six specialized agents running adversarial review — found 130 issues |
| Monolith drag | main.py at 6,130 lines; every change touched everything | Agent-identified split: 6,130 → 239 lines across 12 router modules. Zero regressions. |
| Cost blindness | All work routed to Opus regardless of complexity | Tiered model selection: Haiku for batch, Sonnet for agents, Opus for strategic. ~20x cost difference. |
| No governance | Agents could dispatch freely with no approval thresholds | Autonomy budget: per-agent threshold. Above → approval queue. Below → auto-execute. |
| Identity ambiguity | ”Hive” / “OpDek” / “DxDev” conflated | Rebrand commit day 21. Clean separation: product / company / domain. Four commits to stabilize. |
Part 4 of The Timeline — the true story of building an AI operations engine, backed by git history and real incidents.