Levers 06 aur 07: failures normal hain. Sawaal yeh hai ki aapka loop unhe padhta, seekhta aur resume karta hai β ya unme doob jaata hai.
Ab tak aap loop ko shape de sakte ho (Lever 01), use achhe se feed kar sakte ho (Lever 02), aur use safely stop kar sakte ho (Lever 05). Yeh lesson is baare mein hai ki kya hota hai jab ek tool call fail hoti hai β jo, kisi bhi real agent mein, lagatar hota rehta hai. Error recovery ki skill hi ek aise agent ko, jo khud ko correct karta hai, us agent se alag karti hai jo spiral kar jaata hai.
Turn ke Move 4 ne results βaur errorsβ capture kiye the. Wo βaurβ hi poora khel hai. Ek failed tool call ground truth hai, bilkul ek successful call jaisi β yeh model ko batati hai ki kya kaam nahi aaya taaki agla turn adjust kar sake. Reliability move yeh hai ki error ko ek observation ki tarah context mein append karo aur loop ko ek aur swing lene do.1 Error ko nigal jao toh model andha ho jaata hai; loop wapas guessing par aa jaata hai.
Yeh raha trap. NaΓ―ve fix β poora stack trace wapas paste kar do β usi lever ko poison kar deta hai jo aapne abhi seekha. Raw errors enormous, low-signal hote hain, aur accumulate hote hain: har retry text ki ek aur deewar dher kar deti hai jab tak window sad na jaaye (Lesson 3).
Discipline yeh hai error compaction: failure ko uski essential, actionable detail tak nichod do isse pehle ki wo window mein dobara enter kare.1 βConnectionError: timeout after 30s calling pricing API; retry 2/3β signal carry karta hai; 80-line waala traceback nahi.
Rather than passing full stack traces, errors are distilled into essential details, keeping context window usage efficient while preserving debugging capability. β HumanLayer, 12-Factor Agents, Factor 9
# β the dump: 80 lines of traceback, har retry par re-paste
Traceback (most recent call last): File "agent.py", line 412 ...
# β the compaction: sirf signal
def compact_error(e):
return f"{type(e).__name__}: {short(e)} | retry {e.attempt}/3"
# -> "ConnectionError: pricing API timeout 30s | retry 2/3"
Recovery sirf in-loop nahi hoti. Kabhi-kabhi sahi move yeh hai ki pause karo β ek human ke liye, ek approval ke liye, ek fix ke liye β aur baad mein thread khoye bina resume karo. Iske liye apna state own karna zaroori hai. 12-Factor ka nuskha:
Ek turn ko ek pure function ki tarah treat karo: (state, input) β new state. Koi hidden side state nahi matlab retries aur replays deterministic aur observable hote hain.1
Agar state ek serialisable object mein rehta hai, toh aap checkpoint kar sakte ho, human ya failure ke liye pause kar sakte ho, aur usi exact point se resume kar sakte ho β koi data loss nahi.1
Agent ki step history aur real-world outcome ko ek hi jagah rakho taaki wo apart drift na kar sakein β trustworthy recovery ki basis.1
def turn(state, inp): # (state, input) -> new state; koi hidden side state nahi
...
return new_state # serialisable -> checkpoint, pause, resume, replay
# human approval ke baad resume β kuch nahi khoya
state = load(checkpoint_id)
state = turn(state, human_decision)
Sab milake: kyunki har turn owned, serialisable state par ek pure reduction hai, ek failure kabhi fatal nahi hoti β aap turn ko retry kar sakte ho, handoff karke resume kar sakte ho, ya poore run ko ek eval mein replay kar sakte ho. Yahi hai jis par βloop khud ko correct karta haiβ asal mein tika hua hai.
Ab aap failures ko fuel mein badal sakte ho: errors ko ground truth ki tarah append karo, par compacted β dump nahi; retries ko guard-stop se bound karo; aur state ko ek stateless reducer ki tarah own karo taaki koi bhi run pause, resume, ya replay ho sake. Levers 06 aur 07, saath mil kar kaam karte hue.
Memory se retrieve karo. (Questions Lessons 3 aur 4 ko interleave karte hain.)
HumanLayer β 12-Factor Agents. Factor 9 padho (compact errors), phir state par Factors 5, 6 aur 12. Yeh recovery aur ownership ka sabse clear, production-grade account hai. Iske saath context-engineering waala piece pair karo taaki samajh aaye ki compaction window ko kaise protect karta hai.
TradingAgents loop cleanly pause-and-resume kar sakta hai ya nahi, ya uska execution aur business state drift kar sakta hai? Chat mein poocho.