Technology

Everyone is celebrating AI coding tools for writing five times more code — almost nobody is asking what happens to the pipelines that were built to test it, and a Helsinki startup just raised $4.7M on that exact blind spot

I used to think CI/CD was a solved problem.

I used to think CI/CD was a solved problem. For years I’d watch engineering teams I worked with treat their build pipelines as plumbing. Boring, mostly invisible, occasionally annoying, but fundamentally finished. The interesting work was always somewhere else: the product, the model, the growth loop. The conveyor belt that carried code from a developer’s laptop to production was just there, like electricity.

That assumption is now the most expensive mistake in software.

Avrea, a Helsinki-rooted startup founded by Hannu Valtonen and Juha Valvanne, emerged from stealth this week with $4.7 million in pre-seed funding led by Earlybird, with a thesis that sounds almost embarrassingly obvious once you say it out loud: if AI is going to substantially increase code output, somebody has to test, validate, and ship that code at the same pace. And the systems we built for that job were designed when a human typed every line.

The bottleneck nobody wanted to look at

The conventional story about AI in software development is a productivity story. Copilot, Cursor, Claude Code, the agentic IDEs. They all promise a step-change in how much code a developer can produce. Most of the discourse treats this as straightforwardly good. More code, faster shipping, smaller teams doing bigger things.

What that story leaves out is the second half of the pipeline.

Writing code is the upstream half of software delivery. The downstream half is everything that happens between the moment code exists and the moment it runs in production: unit tests, integration tests, end-to-end tests, security scans, build artefacts, container images, staged rollouts, canaries, observability. That whole apparatus is what the industry calls CI/CD, continuous integration and continuous delivery, and almost none of it was designed for a world where an AI agent might open forty pull requests before lunch.

The core problem is straightforward: as teams generate substantially more code through AI assistance, they face a corresponding increase in tests to run, and the strain on CI/CD systems becomes impossible to ignore. That’s it. That’s the whole bottleneck.

Why this gap is widening faster than people realise

I want to be careful here, because the framing matters. The problem isn’t that CI/CD is broken. It works. GitHub Actions, CircleCI, Jenkins, GitLab. They all do what they were built to do.

The problem is that what they were built to do assumed a particular ratio between code volume and human review. A developer wrote a hundred lines, opened a pull request, a colleague read it, the pipeline ran for ten minutes, someone clicked merge. The bottleneck was the human in the middle, and the pipeline was sized to wait for that human.

Pull that human out, or partially replace them with an AI reviewer, or have an AI agent open the PR in the first place, and suddenly the pipeline is the bottleneck. The thing that used to wait is now the thing being waited on.

This is the structural shift Avrea is betting on, and it’s worth being precise about why it’s not just a scaling problem you can solve by buying more runners.

Flaky tests stop being a nuisance and start being a tax

Every engineering team has flaky tests. Tests that pass and fail nondeterministically, usually because of timing, network, or shared state. In a human-paced workflow, flakiness is annoying. You re-run the build, you grumble, you move on.

In an AI-paced workflow, flakiness is catastrophic. If an agent opens a PR, the test fails for non-code reasons, the agent reads the failure, decides the code is wrong, rewrites it, opens another PR, and the loop runs all night burning compute on a problem that doesn’t exist. I’ve watched a version of this happen with my own team’s experimental agent setups. The agent isn’t wrong to trust the test signal. The test signal is just lying.

This is why Avrea’s emphasis on pipeline observability, finding root causes of flaky tests, stalled builds, infrastructure bottlenecks, isn’t a nice-to-have feature buried at the bottom of the press release. It’s the load-bearing claim. Agentic development only works if the signals the agents read are trustworthy.

The single-line integration is a strategic choice, not a technical flex

Avrea’s approach is designed to be adopted with minimal friction, fully compatible with existing CI/CD workflows. On the surface, that sounds like marketing copy. Look closer and it’s a thesis about how this category will be won.

The losers in developer tooling are usually the products that ask teams to migrate. The winners sit underneath, alongside, or in front of what already exists. Datadog didn’t tell you to replace your logging. It ate everyone else’s logs. Vercel didn’t tell you to rewrite your React app. It just deployed the one you had.

The same pattern applies here. If Avrea works the way it’s described, an engineering team doesn’t have to make a decision about whether to bet on it. They drop it in, see if the pipelines get faster and the failure signals get cleaner, and either keep it or rip it out. That’s a fundamentally different sales motion than asking a VP of Engineering to greenlight a CI/CD replacement project.

The deeper move: making pipelines AI-native

The underlying principle is that software development is increasingly becoming a collaborative process between humans and AI, making it essential for AI agents to integrate directly with software delivery systems.

Today, most CI/CD systems are designed to be talked to by humans through dashboards and YAML files. An AI agent that wants to understand why a build failed has to scrape logs, parse stack traces, and reason its way backwards through artefacts that were formatted for a person reading a browser tab at 2am.

This is the kind of friction that’s invisible until you go looking for it. Once you do, it’s everywhere. The entire developer tooling stack was designed under the assumption that the consumer of its outputs is a human. The moment that assumption breaks, every layer needs to be rethought.

What Avrea seems to be claiming, and what I’d want to see in the product to fully believe, is that the pipeline becomes a first-class participant in the agent loop. The agent can ask the CI system structured questions. The CI system can hand back structured answers. The conversation between code-writer and code-validator becomes machine-to-machine.

If that works, it’s not a faster CI. It’s a different category of product.

Why this is a European story worth paying attention to

The founders both come out of the Finnish infrastructure-software scene, bringing technical backgrounds in areas like database and cloud infrastructure. The kind of experience that matters more here than another round of pattern-matching from the consumer AI world. CI/CD is unglamorous, deeply technical, and unforgiving of founders who don’t understand the operational reality of large engineering organisations.

The lead investor, Earlybird, has a strong track record backing developer-infrastructure companies in Europe. A $4.7M pre-seed for a CI/CD play with two technical co-founders is exactly the kind of bet that gets dismissed by people who only look at consumer-facing AI rounds. It’s also the kind of bet that, if it works, becomes load-bearing infrastructure for an entire generation of AI-native engineering teams.

I’ve written before about how the calmest investors in the world aren’t actually calm. They’re structurally positioned. The same dynamic applies in early-stage developer tooling. The deals that look boring from the outside are often the ones where the structural position is strongest, because the buyers are technical, the budgets are real, and the switching costs work in the incumbent’s favour once a product lands.

The agentic coding numbers nobody can verify yet

Here’s where I want to slow down, because there’s a tendency in this discourse to throw around statistics about AI-generated code as if they’re settled. They’re not.

You’ll see claims that 30%, 40%, sometimes 70% of code at certain companies is now AI-generated. The actual share depends entirely on how you count. Lines of code? Accepted suggestions? Functions written entirely by an agent versus functions where an agent autocompleted three tokens? The numbers vary by an order of magnitude depending on the definition.

What’s not in doubt is the direction. Every major engineering organisation I’ve spoken to in the last twelve months, and the ones publicly reporting figures, like Microsoft, Google, and Meta, is seeing the share rise quarter over quarter. You don’t need to know whether AI writes 25% or 55% of code today to make Avrea’s bet rational. You need to believe the curve goes up. And the curve clearly goes up.

What I’d want to see next

A pre-seed announcement is a thesis, not a verdict. Three things would tell me whether Avrea is actually building something category-defining versus building a faster GitHub Actions.

First, native agent protocols. If Avrea publishes the structured interface that agents use to query and act on pipelines, and if other tools adopt it, then the bet on AI-native CI/CD is real. If the integration with agents is bolted-on through standard webhooks and log scraping, it’s marketing.

Second, observability outputs that change agent behaviour. The test of whether pipeline observability is actually solving the flaky-test tax is whether agents using Avrea make fewer wasted iterations than agents using legacy CI. That’s measurable. I’d love to see the data once early customers have it.

Third, who adopts it first. CI/CD adoption tends to start at the early-stage end of the market and work upward. If Avrea is showing up inside AI-first engineering teams within six months, it’s working. If it’s stuck pitching to skeptical enterprise buyers a year from now, it isn’t.

The boring infrastructure thesis

I’ve spent the last few years writing about how power moves through quiet, procedural mechanisms rather than dramatic announcements. The same logic applies in software. The most important shift in AI-era engineering isn’t going to be the model that gets the most demos. It’s going to be the infrastructure layer that absorbs the consequences of those models, and does so invisibly enough that nobody notices.

CI/CD is one of those layers. If Avrea, or whoever wins this category, does the job right, developers in 2030 won’t think about it any more than developers in 2015 thought about Jenkins. The pipeline will just run. The tests will just pass or fail for the right reasons. The agents will just ship code.

Here’s the uncomfortable part for anyone running an engineering organisation today. The teams that treat their pipelines as solved infrastructure while their developers ship five times more code are running a clock they can’t see. The flaky tests are already lying to their agents. The compute bills are already climbing for reasons nobody can explain in a standup. The merge queues are already lengthening. None of this shows up as a crisis. It shows up as a slow, expensive drag that gets blamed on the model, the team, the roadmap, anything except the conveyor belt underneath all of it.

In five years, the engineering leaders who got this wrong won’t be fired for missing an AI strategy. They’ll be quietly replaced by people who understood, earlier, that the pipeline was the strategy. If you’re reading this and still think CI/CD is plumbing, the bet has already been placed against you. You just haven’t been told yet.

Free field brief

The Empire File

How money, ownership, and power actually move behind the companies you think you know. Built for people who watch our documentaries. Sundays you also get The Undercurrent.

Written by

Silicon Canals Editorial Team

The Silicon Canals Editorial Team produces content across our three editorial pillars: technology and business, power and investigations, and human systems. We chronicle the systems that shape our lives, from the global infrastructure of technology to the internal infrastructure of the human mind. Articles reflect our team's collective editorial process, sourcing, drafting, fact-checking, editing, and review, rather than a single journalist's writing. Silicon Canals takes editorial responsibility for content under this byline. For more on how we work, see our editorial policy.