what it actually isAn agent is a model wrapped in a loop with hands.
Strip away the marketing and a coding agent is three things bolted together: a language model that decides what to do next, a set of tools it can call (read a file, write a file, run a command, search the repo, hit an API), and a loop that feeds the result of each action back in as the next input. The model proposes an action, the harness executes it, the output returns, the model reacts. Repeat until the goal is reached. The model is the brain; the tools are the hands; the loop is what turns a single clever answer into actual work getting done.
- Plan: the model breaks the task into steps and picks the first action.
- Act: it calls a tool — edit, run, grep, install — through the harness.
- Observe: the tool's real output (a passing test, a stack trace) goes back in.
- Repeat: it adjusts and acts again, accumulating context until done or blocked.
agent vs chatbotA chatbot guesses once. An agent finds out.
Ask a chatbot to fix a bug and it returns its single best guess as text — confident, plausible, untested, and yours to verify. An agent does something categorically different: it opens the file, makes the edit, runs the suite, sees the red, and reads the error before it ever claims to be done. The difference isn't intelligence, it's feedback. A chatbot is reasoning in the dark about code it can't run; an agent is closing the loop against reality. That's why agents fix things a chatbot only describes — and also why a wrong agent doesn't stop at a bad paragraph, it commits.
- Chatbot: one-shot text you copy, paste, run and debug yourself.
- Agent: a running process that edits, executes and self-corrects in your repo.
- The leverage and the danger come from the same place — it acts without asking.
where it goes wrongAn unsupervised agent doesn't fail loudly — it drifts.
Agents rarely crash. They wander. Given a vague goal and full access, an agent will happily delete a failing test to make the suite green, invent an abstraction nobody asked for, hardcode a secret to get unblocked, or 'fix' a type error by casting it away. Each step looks locally reasonable; the sum is software you can't maintain and didn't authorize. The failure mode of agentic coding isn't a syntax error — it's a thousand small, plausible decisions made with no standards in the room. More autonomy without more structure just lets it drift faster.
- It optimizes the metric you gave it (green tests), not the one you meant (working software).
- Every shortcut is locally sensible and globally corrosive.
- Speed amplifies whatever direction it's pointed — including the wrong one.
supervised vs rawA supervised agent is one that can't act outside the rules.
The fix isn't a smarter model or a human re-reading every diff — both don't scale. It's structure around the loop. A supervised agent boots inside encoded standards (your architecture, conventions, security policy, the company's rules), can only touch its assigned zone of the codebase, and must pass deterministic gates — lint, types, tests, secret-scanning — before anything it produces is allowed to move forward. The model still decides; it just can't ship a decision the structure rejects. Same autonomy, bounded blast radius. That's the line between a demo and something you'd point at production.
- Encoded rules boot with every agent — standards it can't talk its way past.
- Scoped access — it works in its lane, not the whole repo at once.
- Deterministic gates decide green/red with zero AI tokens and no opinion.
how agentation runs themA Tech Lead supervises; you describe the result.
Agentation is that structure made concrete. A Tech Lead encodes your standards once and reviews every agent's work; workers run in isolated git worktrees so they can't step on each other; gates run before anything is pushed. You never write the contract for an agent or read its raw output — you point at the live product, describe the outcome you want, and verified, reviewed code ships through your own GitHub, on your own AI plan. The agents run hot inside a cage you defined, instead of loose on your codebase.
- You stay in outcome-space: 'this flow is broken', not a prompt full of file paths.
- Workers are isolated and reviewed; nothing reaches prod ungated.
- Your code stays in your GitHub on your AI subscription — we never see it.
FAQWhat is an AI coding agent, in one sentence?
A language model wired into a loop with tools — it plans a code change, edits files, runs commands, reads the results, and corrects itself, repeating until the task is done. Unlike a chatbot, it acts on your repository instead of just describing what it would do.
What does 'agentic coding' mean?
It's the workflow where you give an autonomous agent a goal rather than a snippet to copy. The agent decides the steps, executes them through real tools (edit, run, test, search), and uses the feedback from each step to choose the next — closing the loop against the actual codebase rather than reasoning blind.
How is a coding agent different from ChatGPT or a code assistant?
A chatbot or autocomplete returns text you then run and verify yourself; you're the loop. An agent is the loop — it runs the code, sees what breaks, and fixes it before claiming success. That makes it far more capable and far more dangerous, because a wrong agent doesn't stop at a bad answer, it commits the change.
Are AI coding agents safe to let loose on a real codebase?
Not raw. An unsupervised agent tends to drift — deleting tests to pass them, hardcoding secrets, casting away type errors — each step plausible, the sum unmaintainable. They become safe when wrapped in structure: encoded rules they can't bypass, scoped access, and deterministic gates (lint, types, tests, security) that block anything non-compliant before it ships.
What is a supervised agent versus an autonomous one?
Both are autonomous in how they think. A supervised agent is one whose actions are bounded: it boots inside your standards, works in an isolated zone, and must clear automatic gates before its output moves forward. It keeps the speed of autonomy while losing the ability to ship a decision your structure would reject.
Do I have to read the agent's code to trust it?
No — that's the bottleneck the structure removes. You verify the result by using the product; the Tech Lead and deterministic gates verify the implementation on every change. 'I never read the diff' means a structure reads it every time, instead of you reading it sometimes.