CI and agents
The dream of infinite throughput
It is easy to dream in the age of agents. Here's one of mine:
I want to just spin up a swarm of AI coders, point them at a repo, and have them solve our tickets and clear our backlog while we sleep. These guys will work in parallel branches, opening and closing PRs at unprecedented speeds. We wake up the next morning and viola - we just need to validate, check. If an idea fails, we turn to focus on a more important thing.
Ralph Pattern
A popular manifestation of this dream on X is a Ralph pattern. I quote directly from the original post here: Ryan Carson on X: "Step-by-step guide to get Ralph working and shipping code" / X:
- Pipe a prompt into your AI agent
- Agent picks the next story from prd.json
- Agent implements it
- Agent runs typecheck + tests
- Agent commits if passing
- Agent marks story done
- Agent logs learnings
- Loop repeats until done
That's pretty cool. It seems like I can make the agent do stuff in a loop while I sleep now. Obviously, the taste, skill, specificity, and context-awareness of the prompter remain essential skills. But the stability of the application is the real bottleneck. Let's turn our attention to point 4: Agent runs typecheck + tests.
Traffic lights first, traffic second
Typechecks and tests are... surprise surprise... CI. Continuous Integration (CI) is no longer just "best practice". It is absolutely essential. I am going to take it one step further:
We need to set up our Ops FIRST, before we can even think about prompting an agent.
I suppose CI is like the traffic light. Our agents are swarms of traffic piling up. One bad move and everything breaks, then accidents happen. We need to rigorously gate changes with CI and force our agents to respect the traffic rules.
My experience in data pipeline development
I've been working mostly on Python data pipelines recently, and I've found that generic CI isn't enough. When an AI is writing the code, the checks need to be stricter. If I’m letting an agent go ham on a data project, here is my choice of a non-negotiable CI checklist I set up before I write a single line of prompt:
- Aggressive linting and formatting (ruff): fast and keeps everything nice and tidy
- Static type checking (ty): fast and keeps everything properly-typed
- Data integrity (Pydantic/dataclasses): data shape, data shape, data shape
- Security scanning: gone shall be the days of committing API keys
- Smoke test: Super Critical. Minimal e2e run on a subset of data. Agents are great at writing unit tests that pass... for broken code. A smoke test ensures the pipeline actually runs
Hello Ops
It kinda feels counterintuitive. We have these magical AI tools, and yet we are spending more time writing YAML configuration files for GH Actions, or ADO pipelines, or Jenkins. But that’s the paradox. To get the infinite throughput of AI, we need the infinite patience of a machine verifying that work. In this line of reasoning, Ops isn't a chore anymore. It's the only way to scale.