Ten of Eleven AI Coding Agents Can Be Fooled by Bash Tricks Older Than Their Users. The One That Held Won by Reading the Command the Way the Shell Will.

Patrick Duggan
2 hours ago
5 min read

# Ten of Eleven AI Coding Agents Can Be Fooled by Bash Tricks Older Than Their Users. The One That Held Won by Reading the Command the Way the Shell Will.

Researchers at Adversa AI took eleven popular open-source AI coding agents — the kind that read a repository, reason about it, and then run shell commands to build, test, and fix code — and tried to slip malicious instructions past their safety guardrails using nothing exotic. Just Bash tricks. The obfuscation techniques they used to hide a dangerous command inside an innocent-looking one are older than most of the developers running these agents; Bash itself dates to 1989. Ten of the eleven agents fell for at least one of them. Only one closed the gap completely. Adversa calls the structural weakness GuardFall, and while they are clear that this is lab research with no reported real-world exploitation yet, the finding matters far beyond these eleven tools, because it exposes the exact wrong way — and the exact right way — to decide whether a command is safe to run.

What GuardFall actually is

Start with what an AI coding agent does, because the risk lives in the job description. You point one of these agents at a codebase and it becomes an active participant: it reads the files, forms a plan, and executes real commands on your machine to carry it out — installing dependencies, running builds, editing files, invoking the shell. Many of them ask for your approval before running a command. That approval step, and the safety filter behind it, is the entire security boundary. It is the thing standing between "the agent helpfully ran a build" and "the agent helpfully ran something a malicious repository told it to."

GuardFall is Adversa's name for a structural flaw in how that boundary is enforced. It is not one specific bug in one specific product. It is a class of failure: the agent's guardrail looks at a command as a string of text and tries to judge whether it is dangerous, and Bash offers a thousand ways to write a dangerous command so that it does not look dangerous as text. Variable expansion, string concatenation, encoding, indirection — the same shell features that make Bash powerful make it trivial to disguise intent. A malicious repository can carry instructions that, when ingested by the agent and passed toward the shell, reassemble into something destructive only at the moment the shell actually runs them. The guardrail read the disguise. The shell runs the truth.

The numbers are the uncomfortable part. Adversa tested eleven popular open-source agents — including names like Hermes, OpenCode, and Roo-code — and ten of them left the gap open in one of four ways. Only one blocked every trick.

Why the one that held is the whole lesson

The agent that passed — Continue — did not win by having a longer list of bad strings to look for. It won by refusing to judge the command as a string at all. Instead, it reads the command the way Bash will read it: it breaks the command into the same tokens the shell would, works out what will actually execute, and checks that against a hard list of destructive operations that are blocked outright. It evaluates the real behavior, not the surface text. That protection held against every payload Adversa threw at it in the agent's default editor mode.

This is the entire security lesson compressed into one design choice, and it is not new — it is the oldest principle in input handling, rediscovered in an AI context. You cannot secure something by pattern-matching the string a human or a model sees. You have to evaluate it the way the system that consumes it will evaluate it. A guardrail that inspects the text and a shell that executes the tokens are looking at two different things, and the gap between them is exactly where the attack lives.

We say this with some feeling, because it is the same principle our own pre-flight work is built on. Our Dredd pre-flight checks for the Model Context Protocol exist precisely because deciding whether an action is safe requires understanding what the action will actually do — the server it really contacts, the dependency it really pulls, the command it really runs — not what its description claims. Ten of eleven agents failed GuardFall for the same reason a lot of security fails: they trusted the label instead of parsing the contents. The one that passed did the harder, correct thing.

What to actually do

If you run an open-source AI coding agent — or any agent that executes shell commands on your behalf — treat its command-safety guardrail as a speed bump, not a wall, until you have evidence otherwise. The convenience of "the agent handles the terminal for you" is real, and so is the fact that most of these guardrails can be walked around by a shell trick that predates the tool by decades.

Do not point an agent with shell execution at a repository you do not trust. The GuardFall path runs through ingested content — a malicious repo, a poisoned dependency, a crafted file the agent reads and acts on. The moment an agent with terminal access is reasoning over untrusted input, the untrusted input has a route to your shell, and the guardrail is the only thing in the way. Run untrusted code exploration in a disposable, isolated environment — a container or VM with nothing you care about mounted — not on the laptop that holds your keys.

And when you choose an agent, ask the one question that separated the eleven: does its safety check evaluate the command the way the shell will, or does it pattern-match the text? The former is defensible engineering. The latter is a filter waiting for a trick it has not seen. Prefer keys over passwords, isolation over trust, and semantic command analysis over string blocklists.

Why we are flagging lab research before the boom

There is no reported in-the-wild exploitation of GuardFall yet, and we want to be honest about that — this is Adversa's research, demonstrated in a lab, not an incident report. It would be easy to wave it off on those grounds. We are doing the opposite, because the entire value of watching left of boom is acting on the structural weakness while it is still theoretical rather than after it is a supply-chain crisis. AI coding agents are being adopted at exactly the speed you would expect for a tool that saves developers real time, which means the number of machines with an agent holding a live terminal is climbing fast, and the repositories those agents read are not all friendly.

We will cap it at ninety-five percent as always — the research may not generalize to every agent or every configuration, and defenses are already improving in response. But the shape is clear and the fix is known: an agent is only as safe as the parser behind its guardrail, and a guardrail that reads the disguise instead of the command is not a guardrail. Ten of eleven read the disguise. Make sure the one holding your terminal is the eleventh.

Her name was Renee Nicole Good.

His name was Alex Jeffery Pretti.

Ten of Eleven AI Coding Agents Can Be Fooled by Bash Tricks Older Than Their Users. The One That Held Won by Reading the Command the Way the Shell Will.

What GuardFall actually is

Why the one that held is the whole lesson

What to actually do

Why we are flagging lab research before the boom

Recent Posts

Comments