The AI agents we'd build for you
already run our business.

DxDev is an AI operations consultancy. We've been running our own multi-agent engine in production for over a year, and we use it to run our business. All of that architecture and the full build log are public on this site. You can read it before you hire us.

Start a conversation See how it runs

Stage one · Planner

Unstructured intent becomes scoped work.

The request arrives as whatever a human typed or an integration passed in. The Planner reads it, decides what work actually needs to happen, weighs it against the current backlog, and writes a spec that downstream agents can execute against. If the intent is ambiguous, it routes back to a human instead of guessing. That's the step most AI systems skip, which is why so many demos never ship real work.

Stage two · Dispatch

The right agent, for the right cost, right now.

Dispatch picks the executor and the model tier. It looks at who's idle, who has the relevant context already loaded, and which task qualifies for Haiku's speed versus Opus's reasoning. The routing alone saves roughly twenty times the compute cost of running everything on the flagship model. When there is no good fit, Dispatch holds the task and escalates rather than faking a handoff.

Stage three · Build

Bounded scope. Autonomy where it's earned.

The Builder writes the code, edits the files, runs the tests. It works autonomously within the spec, but it stops at any threshold configured for the engagement: a destructive migration, a protected file, a budget limit. Three failed build attempts escalate to a human. Otherwise Build ships without intervention. We default to cautious and tune from there.

Stage four · Review

Someone who didn't write it reviews it.

Review is a different agent than Build, with a different prompt and different priorities. Adversarial by design. It catches the blind spots a single-reviewer system misses: the implicit coupling, the untested error path, the regression nobody thought to look for. A full adversarial pass on our own codebase surfaced a hundred and thirty issues the first time we ran it. This is the step that turns AI-written code from a draft into something you can actually merge.

Stage five · Ship

Packages, deploys, and leaves a runbook.

When Review passes, Ship builds the release artifact, runs the deployment, confirms the smoke tests, and writes a runbook summarizing what changed. For client engagements this is also the handoff moment. Ship gives your team everything they need to own the agent going forward. No vendor lock-in, because the last thing Ship does is document how to run it without us.

Built on

AnthropicCloudflarePostgresAstroPythonDocker

Section one Working together

If you'd like to work with us.

Every engagement is scoped against the specific problem you are trying to solve, so we don't publish fixed packages. Multi-agent work cuts too differently across teams for one-size-fits-all pricing to be honest.

We take on a small number of engagements each quarter, and we only commit to work we are confident we can deliver. If you think there might be a fit, email us. You will hear back within a day, we will set up a conversation, and we will tell you straight whether we are the right studio for what you need.

See our paid discovery Ask a general question

Section two The loop we run on ourselves

Diagnose, ship, operate.

One.

Diagnose

We start by mapping how work actually flows through your team today. Usually there is one specific bottleneck that is burning most of the time, and not every bottleneck is something AI agents can solve. The output is a written document with our diagnosis and recommendation.

Two.

Ship

We build the smallest thing that measurably moves the bottleneck. Approval gates get added where the stakes matter, and the agent runs autonomously everywhere else. By the end of the engagement the system is running in production, not waiting in a staging environment for someone to flip a switch.

Three.

Operate

Once it is running we add instrumentation, review it regularly, and tune as we learn. Agents get more autonomy as they prove they are trustworthy. Your team gets the runbook and all the architecture docs, so nothing about this arrangement is vendor lock in.

Section three Self case study, public data

The case study we lead with is our own.

A lot of AI consultancies sell strategy decks while running their own back office on spreadsheets. DxDev has been running on OpDek for twelve months. The system you see on this page is the system our business actually uses.

“

When a bottleneck appeared we usually ended up writing a new agent instead of hiring. And once a pattern showed up twice, we documented it so the system could run it the next time.

OpDek today did not exist a year ago. The team, the dispatch pipeline, the cost layer, the approval gates. All of it arrived one bottleneck at a time. Everything we sell is something we have already shipped for ourselves first.

Read the 32 day evolution

OpDek Dashboard Screenshot placeholder

Portfolio drill-down. Business, milestone, task, session.

Agent Discord Screenshot placeholder

Live agent conversation. CEO to CTO to Builder handoff.

Dispatch Log Screenshot placeholder

Priority-ordered dispatch with dependency resolution.

Section four The engine in four layers

OpDek in four layers.

OpDek was not designed up front. Each of these layers got added because a specific problem in the previous version of the system forced it.

L1 Interface

Dashboard and Discord

Human in the loop approvals, portfolio drill-down, and a live feed of what the system is doing. Mobile access through Discord. Desktop access through the dashboard.

L2 Agents

Ten specialized roles

CEO, CTO, CFO, CMO, COO, CKO, Planner, Analyst, Builder, Ops. One agent reviewing its own work is not really a review. Ten agents with different priorities can catch what any one of them would miss.

L3 Execution

Supervisor and dispatch

Priority-ordered dispatch with dependency resolution. Stall detection warns at 15 minutes and escalates at 30. Cost aware routing across the Haiku, Sonnet, and Opus tiers saves roughly twenty times the cost of running everything on Opus.

L4 Data

Postgres and JSONL logs

A system that cannot replay its own history cannot learn. Every session is logged. Every learning is tagged. Agents can query what previous sessions found, which means today's work starts from a real baseline rather than a blank prompt.

An interactive version of this diagram is queued for the next release. In the meantime, the full story lives in The Repo.

Section five Published works

We write everything down.

Architecture decisions, failures, and patterns, all documented as they happen. Three streams. All free.

The Timeline The system's evolution, episode by episode. The Repo Architecture patterns you can use. Signals Short-form observations on AI tooling.

Ready to ship something that runs?

We take a small number of engagements each quarter. If you think we might be useful to your team, the fastest way to find out is a twenty minute conversation. Use the paid discovery intake below if you have a specific problem in mind, or the general contact form for anything else. You will hear back within a day.

Start a conversation Read the build log