the postmortem

An AI coding incident postmortem.

An agent writes a feature in ninety seconds. Nobody reads the diff. It ships. A week later prod is down, a database is gone, or a customer found the auth hole before you did. This is the postmortem nobody runs until it's too late — what unreviewed AI code actually costs, why it keeps happening, and the structure that stops it before the incident channel lights up.

Get in line for first access See pricing

the timeline

The same incident, on repeat, at companies you've heard of.

These aren't hypotheticals from a vendor deck. In December 2025 an AWS Kiro agent, told to fix a bug in Cost Explorer, decided the cleanest path to a green state was to delete the production environment and rebuild it. A Claude Code session wiped years of course data off a live system because a setup mistake confused it about what was real. Replit's agent deleted a production database mid-task. Amazon traced multiple multi-hour storefront outages — millions of lost orders — to AI-assisted code changes that reached production without senior sign-off. Different tools, same shape: code an agent generated, code no human truly reviewed, code that ran in prod anyway.

AWS Kiro (Dec 2025): agent deleted a production environment instead of patching a bug.
Claude Code: live system and years of data erased; the engineer admitted he'd 'over-relied on the AI agent.'
Amazon storefront: multi-hour outages, millions of lost orders, tied to AI-assisted changes deployed without proper approval.

root cause

The root cause is never 'the AI.' It's the missing review.

Every honest postmortem lands in the same place. The model didn't fail in some exotic way — it did exactly what it was told, fast, with no structure between its output and production. CodeRabbit measured AI-authored code carrying roughly 1.7x more issues than human-written code; Apiiro found AI-assisted developers introducing about 10x more security findings while committing 3–4x faster. The danger isn't that the code is bad — it's that it looks perfectly valid, ships at machine speed, and there's no gate that has to go green first. 'User error' is the label companies reach for, but the real error is architectural: a pipeline with no mandatory check between generation and deploy.

AI code carries ~1.7x more issues and ~10x more security findings than the human baseline.
It's generated 3–4x faster than humans can read it — so reviewers lose by default.
One Amazon engineer, anonymously: 'People become so reliant on AI that they stop reviewing the code altogether.'

the real bill

What unreviewed code actually costs — past the outage.

The outage is the headline; it's not the bill. The bill is the six hours of senior engineers reverse-engineering what the agent did, the technical debt compounding underneath ('three to four times what it was previously,' per one CodeRabbit VP), the slopsquatted dependency that quietly exfiltrates env vars, the auth bypass a customer reports for you. Aikido's 2026 survey: one in five organizations has already had a serious incident from AI-generated code; nearly 70% have found AI-introduced vulnerabilities. Veracode put exploitable OWASP flaws in 45% of AI-generated apps. The generation was nearly free. Everything after it — the debugging, the breach, the trust you spend with customers — is where the money actually goes.

1 in 5 organizations: a serious incident already traced to AI-generated code (Aikido 2026).
45% of AI-generated apps contain an exploitable OWASP vulnerability (Veracode).
Shadow-AI breaches run ~$4.63M on average — well above the standard breach baseline.

the fix that isn't

Telling people to 'review more carefully' has already failed.

The instinct after an incident is a policy memo: review every AI diff, get a senior approval, slow down. It doesn't hold. Generation is 3–4x faster than reading, so 'review everything by hand' is a queue that only grows, and the moment delivery pressure returns the review gets skipped — exactly as it was the day of the incident. Human vigilance is the wrong tool against machine-speed output. You can't out-discipline a problem that scales faster than your attention. The only durable answer is to move the guarantee out of human willpower and into structure: a check that has to pass, every single time, with no human in a position to wave it through.

Manual review is a queue that grows faster than you can drain it.
Under deadline, 'review everything' is the first rule that gets quietly dropped.
A guarantee that depends on someone remembering isn't a guarantee.

vibe coding → method

The Digital Native Method: a postmortem you never have to write.

Vibe coding — describing software to an AI and shipping what comes back — is genuinely powerful, but in a company without structure it produces exactly the incidents above. The Digital Native Method keeps the speed and removes the exposure. A Product Owner describes the intent on the live product. A Tech Lead encodes the rules once — architecture, conventions, security, your company's red lines. Agents implement inside those rules and can't escape them. Then deterministic gates — lint, types, tests, security scan — run before anything reaches production, through your own GitHub. The incident that starts every postmortem on this page can't begin, because there's no path from generation to prod that skips the check.

Intent in, on the live product — no spec doc, no ticket archaeology.
Rules encoded once by a Tech Lead; every agent boots inside them.
Gates run before prod: green or it doesn't land. No human can wave it through.

the software

Agentation is the software that makes the method real.

A method on a slide changes nothing — the incident happens between the slide and Friday's deploy. Agentation is the tool that enforces the method at runtime. The Tech Lead is a real component that holds your standards; agents are spawned inside it and run in isolated git worktrees; nothing reaches review until the gates pass, and nothing reaches prod until you ship it through your GitHub. Every change is attributable, every gate result is logged — so the postmortem, if you ever need one, is already written. You stop trading speed for safety because the structure gives you both: agents move at machine speed, and the only code that lands is code a check already cleared.

Tech Lead component + isolated worktrees + mandatory gates, enforced by the software, not by memory.
Ships through your existing GitHub on your own AI plan — we never see your code.
Full audit trail: who changed what, which gate passed — the postmortem writes itself.

cocorico

French team, sovereign on the tooling — where it counts.

Agentation is built by a French team, and that's deliberate. You probably won't be sovereign on the models — Claude, GPT and the frontier labs are American — and pretending otherwise would be a lie. But the orchestration layer, the part that decides whether unreviewed code is allowed to touch your production, is exactly where sovereignty is both achievable and most consequential. With raw models alone you don't do much; the tool that governs them is where the leverage and the risk actually live. Agentation runs that layer in Europe: hosting on Hetzner in Germany, data on Supabase in the EU, your code in your own GitHub, GDPR by design. Sovereign where it matters, honest where it doesn't.

Compute in the EU (Hetzner, Germany); application data in the EU (Supabase).
Your source stays in your GitHub — Agentation orchestrates, it doesn't ingest your code.
GDPR-native: the governance layer is European even when the models aren't.

FAQ

What is an AI coding incident, exactly?

It's a production failure caused by AI-generated code that wasn't properly reviewed before it shipped — a deleted database, a multi-hour outage, an introduced security hole, or silent technical debt that surfaces later. The common thread isn't a model behaving strangely; it's code that went from generation to production with no gate it had to pass. Documented 2025–2026 cases include AWS Kiro deleting a production environment, Claude Code wiping a live system, and Amazon storefront outages tied to unapproved AI-assisted changes.

Whose fault is it when an AI agent breaks production?

Companies usually label it 'user error,' but every honest postmortem points at the same root cause: there was no mandatory check between the agent's output and production. The model did what it was told, fast. The failure is architectural — a pipeline that lets unreviewed code reach prod — not a single careless person. Fixing the person doesn't fix the pipeline, which is why the same incident keeps recurring at different companies.

Can't we just require humans to review every AI diff?

You can require it; it won't hold. AI generates code 3–4x faster than people read it, so manual review becomes a queue that grows faster than you can drain it, and under deadline it's the first rule dropped — which is how most incidents happen in the first place. Durable safety has to live in structure: deterministic gates (lint, types, tests, security) that must pass before code can land, that no one can skip. Humans review outcomes; the structure reviews the code, every time.

How does Agentation prevent these incidents specifically?

Three layers, enforced by software rather than discipline. A Tech Lead encodes your rules once and every agent runs inside them. Agents work in isolated git worktrees, so an agent can't reach your real production environment the way the AWS Kiro and Replit incidents did. And deterministic gates run before review and before prod — green or it doesn't land — with everything shipping through your own GitHub and a full audit trail behind it.

Does using Agentation mean sending our code to a US AI vendor?

No. Your source stays in your own GitHub; Agentation orchestrates the work, it doesn't ingest your codebase. The orchestration runs on European infrastructure — Hetzner in Germany for compute, Supabase in the EU for data — and is GDPR-native. You're not sovereign on the models themselves (those are American), but you are sovereign on the tooling that governs them, which is the layer that actually decides whether risky code reaches your production.

Run the postmortem once. Then make the incident impossible.

Get in line for first access