In June, the developer Peter Steinberger posted two sentences that landed because a lot of people building with AI had already felt the shift. The post drew more than 8 million views.

Peter Steinberger's post on X, June 7, 2026.
Peter Steinberger's post on X, June 7, 2026. @steipete

He is describing something real and genuinely useful. Instead of typing a new instruction for every step, you set a goal and let the agent run: act, check the result, adjust, repeat, until the work is done. Boris Cherny, who built Claude Code at Anthropic, put it the same way: “I don’t prompt Claude anymore. I have loops running. They’re the ones prompting Claude… My job is to write loops.” Andrej Karpathy, who popularized “vibe coding” a year ago, told the No Priors podcast he has not typed a line of code by hand since December.

We have been running agents in loops in production for a while, and the excitement is earned. A loop can carry real work overnight. The teams getting the most out of it have also learned the one habit that turns a loop from a neat trick into something you can trust, and that habit is most of what follows.

A loop is an old idea, newly within reach

None of this needs a new model or a new capability. Anthropic described the mechanism back in December 2024: an agent is “just” an LLM “using tools based on environmental feedback in a loop.” Act, look at what happened, decide the next move, repeat, until the goal is met or a stop condition trips. The loop is the runtime. The model is one step inside it.

Builders have been running that loop by hand for more than a year. In July 2025, Geoffrey Huntley published what he called Ralph, which in its plainest form is one line of shell: a Bash loop that pipes the same prompt file into a coding agent over and over (while :; do cat PROMPT.md | claude-code ; done). Each pass starts with a fresh context window and reloads its specs and plans from files, so state lives on disk instead of in the agent’s memory. Huntley used it to build a working programming language, compiler and all. One line of shell, run patiently, produced something most teams would scope as a quarter of work.

So the jargon is friendlier than it sounds. Loop engineering, the name Addy Osmani gave the pattern in June, is just writing down three things and letting the agent run between them: a goal, a way to check the goal is met, and the limits that stop the loop. The prompts did not disappear. They moved inside the loop, where a program issues them instead of a person typing each one. The skill moved up a level too, which is why the “become a prompt engineer” course searches are fading while the work itself is growing.

We built one, and the gate is what makes it work

Our agent workbench, Packwolf, runs this way. Each agent wakes on its own tasks and works the loop without anyone watching: it researches, drafts, writes, and keeps going until it has a spec or a plan worth showing. At that point it does not ship. It pushes the work into an approval gate, which triggers a review. The reviewer, another agent on routine work or a person on the higher-stakes calls, does one of three things: sends it back into the loop with notes, promotes it to the next agent in the chain, or escalates it to a human for sign-off. Nothing leaves the loop without clearing that gate.

The loop is the cheap part, and that is good news: starting an agent and letting it churn is a few lines of code. The gate is what makes the whole thing worth running. It is what lets the team leave the loop going overnight and trust what comes back in the morning, because every result has been checked before it counts. The gate is not a brake on the loop. It is the thing that lets you take your hands off the wheel.

A loop is only as autonomous as its check is trustworthy. Strengthen the check and you can let it run further.

Strengthen the check and the loop runs further

This is the rule that has held across every loop we have run, and it is an encouraging one once you see it. A loop’s trustworthy output rises with the strength of the check that gates each pass. Call it the verifier ceiling. Raise the ceiling, by making the check stronger, and you can safely let the loop run further and lean on it harder. The check is the lever, and it is squarely in your control.

Weak check versus strong check, on the same loop output.
Weak check versus strong check, on the same loop output. ideius

You can watch what happens when teams skip the check. In 2025, METR ran a careful study of 16 experienced open-source developers working on their own large repositories. With AI tools allowed, they were 19% slower, even though they felt about 20% faster, because the time went into reviewing and correcting output by hand. Read it the builder’s way and it is a roadmap: the speed is real, and a check you trust is what converts it into shipped work instead of a second pass of manual review. Osmani, who named loop engineering, lands in the same place. Your job, he writes, is to ship code you confirmed works, and to build the loop “like someone who intends to stay the engineer.” The check earns its keep most on the quiet wins and the quiet misses alike: a confident wrong answer that sails through unchecked costs far more later, the same failure tax that makes a cheap coding model expensive once you count what its mistakes cost to fix.

What a loop needs before you let it run

Name three things and you can turn one loose with confidence. A goal the agent can pursue without you re-steering it at every step. A check that proves the goal is met: tests that pass, a schema that validates, a reviewer that signs off. And limits that end the run: a cap on iterations, a stop when the changes stop changing, and a ceiling on tokens or spend. If you cannot yet name the check, that is useful information, because the fix is the goal, not the agent.

The good news from running these at scale is where loops actually stumble, which is rarely the reasoning. In our own field study of an autonomous agent workforce, across 942 runs about 70% succeeded, and almost every failure was operational: a bad configuration, a timeout, a tool call abandoned halfway. Those are the engineerable kind. Build the recovery, the checks, and the stop conditions well, and the loop gets reliable fast. That plumbing is most of what we build when a client wants agents they can rely on.

What this means if you run a business

This is one of the better shifts in how software gets built in years, and it is available to you now. The teams that win with it are not the ones chasing the headline or handing a loop their codebase on faith. They put agents in loops behind checks they trust, starting on the work where a check is cheap and a mistake is catchable. Pick a reversible task, keep the risky and irreversible steps behind a named human approval, build the evaluation that tells you when a loop has drifted, and make sure an unattended loop with broad access cannot move client data somewhere it should not go, the same care behind shadow AI. It also helps to know which model belongs inside the loop, which is its own comparison worth running on your own work.

We have a stake to name here. ideius builds these systems, so a team that reads this may later hire us to build one. The advice does not change either way: the loop is the easy part to buy, and the check is the part that deserves your judgment. We are vendor-neutral and earn the same whatever you end up running, which is why we would rather see you spend on the verifier than on a bigger model.

The work did not disappear. It moved up a level, from typing the next instruction to designing the loop and the check that decides when the work is done. That is a better job, and the teams building it now are the ones who will trust their agents while everyone else is still typing.