Curiosity · Concept
Reflexion (Self-Reflection Loop)
Reflexion, introduced by Shinn et al. (2023), is a framework for iterative self-improvement without parameter updates. After each trial an 'actor' model attempts the task, an 'evaluator' signals success or failure (from tests, heuristics, or another model), and a 'self-reflection' model writes a verbal post-mortem that's appended to the actor's context for the next trial. Across AlfWorld, HotpotQA, and HumanEval coding, Reflexion rivals or beats much heavier RL fine-tuning, making it a cheap way to squeeze more performance out of a fixed model.
Quick reference
- Proficiency
- Intermediate
- Also known as
- verbal reinforcement learning, self-reflection loop
- Prerequisites
- chain-of-thought, agent basics
Frequently asked questions
What is the Reflexion pattern?
Reflexion is an agent loop where the model attempts a task, an evaluator judges success or failure, and a self-reflection step writes a natural-language critique that's added to the agent's memory before the next attempt — improving through reflection rather than weight updates.
What does it need to work?
A reliable success signal (unit tests, exact-match answer key, external grader) and enough context window to hold accumulated reflections. Without a trustworthy evaluator the self-critiques just reinforce whatever the model believed before.
How is Reflexion different from ReAct?
ReAct interleaves reasoning and tool actions within one trajectory. Reflexion runs across trajectories — after an episode ends, it reflects on the whole thing and uses that reflection to steer the next episode. They compose well: ReAct inside each trial, Reflexion between trials.
Where does it fall short?
Hallucinated critiques (the model invents a plausible-sounding wrong diagnosis), memory bloat, and tasks with no clear success signal. It also can't fix limits baked into the base model's knowledge.
Sources
- Shinn et al. — Reflexion: Language Agents with Verbal Reinforcement Learning — accessed 2026-04-20
- LangGraph — Reflection tutorial — accessed 2026-04-20