Lever 05: the loop must know how to end. The cheapest reliability win in agent engineering is refusing to let it run forever.
The sixth move of every turn was “check the stop condition.” It’s the move people under-engineer — and it’s where toy loops become incidents. A loop with no firm stop condition doesn’t just hang; it spends real money, hits real APIs, and compounds one mistake into fifty before anyone notices.
TradingAgents-style system) firing real side effects. A weak or missing stop condition is a top cause of runaway agents.3
Robust loops carry two kinds of termination, and you need both:
| Layer | Fires when | Owned by |
|---|---|---|
| Goal-stop | The task is actually done — success criteria met, or the model emits a terminal “finish” signal. | The happy path |
| Guard-stop | Max turns hit · token/cost budget exhausted · the same error repeats · time elapsed. | Your code, unconditionally |
The goal-stop is what you want; the guard-stop is what saves you when the goal-stop never arrives. Crucially, the guard-stop must live in code you own — never trust the model to decide it has run too long. This is Lever 01 again: own your control flow, including the exit.1
turns, spent, last_err, repeats = 0, 0, None, 0
while True:
if turns >= MAX_TURNS or spent >= TOKEN_BUDGET: # guard-stop: budget
return handoff(state, reason="budget")
reply = model(build_context(state))
spent += reply.tokens; turns += 1
if reply.is_final: # goal-stop: task done
return reply.answer
result = run(reply.tool_call)
if result.error:
repeats = repeats + 1 if result.error == last_err else 0
last_err = result.error
if repeats >= 2: # same error 3x → stop, don't spin
return handoff(state, reason="stuck")
state = append(state, result)
How big should the guard-stop be? 12-Factor Agents gives a concrete heuristic that doubles as a design philosophy:
Small, focused agents… handle 3–20 discrete steps, maintaining focus rather than attempting comprehensive problem-solving. — HumanLayer, 12-Factor Agents, Factor 10
The point isn’t the exact number — it’s the posture. A step budget in the low tens keeps each agent small enough to stay reliable. When you hit the cap, you don’t fail silently — you do one of three deliberate things:1
Surface the partial state for judgement or approval — ideally through the same tool-call channel the agent already uses (HITL).1
Compact what’s done into a clean brief and start a new small agent on the remainder — rather than letting one agent sprawl.
Return a clear “could not complete within budget,” roll back side effects where you can, and log it for an eval case.
“Long-running agent” is usually the wrong goal. The reliable pattern is a short agent that knows when to stop and pass the baton — to a human, to another agent, or to a clean failure.
You can now specify a loop’s termination as two layers — a goal-stop on the happy path and an unconditional guard-stop in your code — set a step/token budget in the low tens, and name the three things to do when it trips. That’s the difference between a demo and something you’d leave running.
Retrieve from memory. (One question interleaves Lesson 2.)
HumanLayer — 12-Factor Agents. Read Factor 10 (small, focused agents) and Factor 8 (own your control flow) together — they’re the backbone of this lever. The conference talk (≈30 min) is the best single-sitting version.
TradingAgents loop does when it stalls — does it have a guard-stop, or just a goal-stop? I’ll help you pick a step budget and the right handoff for a system that fires real trades. Ask in the chat.