Continuous Integration And Agents

CI and agents

The dream of infinite throughput

It is easy to dream in the age of agents. Here's one of mine:

I want to just spin up a swarm of AI coders, point them at a repo, and have them solve our tickets and clear our backlog while we sleep. These guys will work in parallel branches, opening and closing PRs at unprecedented speeds. We wake up the next morning and viola - we just need to validate, check. If an idea fails, we turn to focus on a more important thing.

Ralph Pattern

A popular manifestation of this dream on X is a Ralph pattern. I quote directly from the original post here: Ryan Carson on X: "Step-by-step guide to get Ralph working and shipping code" / X:

Pipe a prompt into your AI agent
Agent picks the next story from prd.json
Agent implements it
Agent runs typecheck + tests
Agent commits if passing
Agent marks story done
Agent logs learnings
Loop repeats until done

That's pretty cool. It seems like I can make the agent do stuff in a loop while I sleep now. Obviously, the taste, skill, specificity, and context-awareness of the prompter remain essential skills. But the stability of the application is the real bottleneck. Let's turn our attention to point 4: Agent runs typecheck + tests.

Traffic lights first, traffic second

Typechecks and tests are... surprise surprise... CI. Continuous Integration (CI) is no longer just "best practice". It is absolutely essential. I am going to take it one step further:

We need to set up our Ops FIRST, before we can even think about prompting an agent.

I suppose CI is like the traffic light. Our agents are swarms of traffic piling up. One bad move and everything breaks, then accidents happen. We need to rigorously gate changes with CI and force our agents to respect the traffic rules.

My experience in data pipeline development

I've been working mostly on Python data pipelines recently, and I've found that generic CI isn't enough. When an AI is writing the code, the checks need to be stricter. If I’m letting an agent go ham on a data project, here is my choice of a non-negotiable CI checklist I set up before I write a single line of prompt:

Aggressive linting and formatting (ruff): fast and keeps everything nice and tidy
Static type checking (ty): fast and keeps everything properly-typed
Data integrity (Pydantic/dataclasses): data shape, data shape, data shape
Security scanning: gone shall be the days of committing API keys
Smoke test: Super Critical. Minimal e2e run on a subset of data. Agents are great at writing unit tests that pass... for broken code. A smoke test ensures the pipeline actually runs

Hello Ops

It kinda feels counterintuitive. We have these magical AI tools, and yet we are spending more time writing YAML configuration files for GH Actions, or ADO pipelines, or Jenkins. But that’s the paradox. To get the infinite throughput of AI, we need the infinite patience of a machine verifying that work. In this line of reasoning, Ops isn't a chore anymore. It's the only way to scale.