When External Integrations Collide with Test Suites

The Integration Push

April 7 was the day we wired up Jira integration into OpDek — the autonomous operations system that powers our dispatch pipeline. On paper, this was straightforward: sync plans and workstreams bidirectionally with Jira, so agents can read and update tickets without context-switching into a browser.

It made sense. When your job is automating operations decisions, the last thing you want is agents creating orphaned work that doesn’t exist in the source-of-truth. Jira integration wasn’t a feature request — it was infrastructure.

We merged the Jira integration code. Tests ran locally. Then the CI pipeline hung.

The Hangs

Not a failure. Not a timeout with a clear error message. A hang — the test suite sitting there, consuming CPU, waiting for something that would never come.

The root causes stacked on each other:

Network calls in tests. The Jira integration touched real APIs. Tests weren’t mocking them. Socket calls were blocking indefinitely.
Import-time side effects. Code that tried to load the agent configuration (opdek_config) during test discovery. When the config wasn’t available, imports would hang.
Recursive mock dependencies. The supervisor tests had complex fixture chains that tried to call network-dependent code, which triggered real network requests during fixture setup.
No timeout floor. The CI runner had no global timeout — tests could hang forever until the runner itself was killed.

Each one wasn’t lethal by itself. Together, they created a cascade where you couldn’t even get to the failing test — the suite would hang on the way there.

The Systematic Fix

We couldn’t just add a timeout and call it done. That’s treating the symptom. We needed the tests to actually work.

Here’s what we did, in order:

Add a global timeout floor (2 seconds for socket operations). This was the safety net — anything that looked like a network call would fail fast instead of hanging forever.

Guard import-time config loading. We made opdek_config optional at import time. Tests could import the tagger module without the full system being initialized. The config loads only if explicitly requested.

Skip the integration tests that need a running server. The run_cycle integration tests actually require the API server to be up. CI doesn’t have that. We skip them and rely on manual integration testing instead.

Mock network calls comprehensively. For tests that do need to run in CI, we added explicit mocks for every network operation: Jira API calls, URL opens, stalls checks. No surprises, no hanging.

Lock down fixture scope. Each mock was scoped to the test that needed it, not applied globally. This prevents a mock intended for one test from masking real issues in another.

The final change: add a pytest fixture timeout. A test that hangs now times out in 30 seconds instead of hanging the whole runner.

The Outcome

CI now passes. Tests run. Jira sync works. We can create and update tasks through agents, then see them reflected in Jira without manual intervention.

But the real win was what this surfaced: our agent system had a lot of implicit assumptions baked into imports and fixture chains. Things that worked fine when you ran tests locally (with the full environment) broke hard in CI (with a minimal environment). The integration work didn’t introduce those problems — it exposed them.

The Pattern: Integration as Infrastructure Stress Test

When you add an integration — especially one that touches external systems — you’re not just adding a feature. You’re adding a dependency surface. Every place that code runs (local machine, CI, production) now has to handle that integration correctly, or the whole system shudders.

For autonomous agent systems specifically, this matters more. If your agents orchestrate work through Jira, and Jira integration is flaky, your entire dispatch pipeline fails silently. A hanging test in CI today becomes an agent that hangs in production tomorrow.

The fix wasn’t clever. It was:

Make dependencies explicit. Guard imports that require the full system. Don’t hide dependencies in fixture setup.
Scope mocks carefully. A mock is a contract — be specific about where it applies.
Test multiple environments. Local ≠ CI ≠ production. If a test hangs locally, it’ll hang in CI too.
Use timeouts as a floor, not a ceiling. A global timeout prevents hangs. Individual test timeouts catch slow tests that should fail, not hang.

We also added a strategic goals widget while we were in there, and wired up CI monitoring for GitHub Actions failures — but those were secondary. The real work was making the integration reliable.

The lesson: integrations are infrastructure tests in disguise. They force you to be explicit about assumptions, manage dependencies well, and think about failure modes that never surface in isolation. Build them carefully, and you’ll learn a lot about your system.

Part 6 of The Timeline — the true story of building an AI operations engine, backed by git history and real incidents.

Previous: The Zombie That Blocked Everything