← all concepts

Generative self-retrieval

A reasoning model surfacing facts from its own parameters by generating them as text, where those generated facts then condition and improve its final answer.

Listen

How do reasoning language models pull deep facts out of their own memory? A concept called generative self-retrieval explains this process.

When a model works through a tough question, it writes down related facts as part of its chain of thought. By generating these facts, it brings them into its immediate workspace, which helps it recall the correct answer.

This is a lot like how human memory works, through a process called spreading activation. When you think of one concept, it naturally triggers related thoughts. For a language model, recalling neighboring facts builds a contextual bridge to an answer that would otherwise be out of reach.

For example, to name the tenth king of Nepal, a model might first list the nine kings who came before him. That list acts as scaffolding to help it reach the tenth. Researchers proved this by taking those intermediate facts and feeding them directly to a model with its reasoning turned off. The model still got the answer right, showing that the written facts themselves carry the benefit.

But this mechanism is as fragile as it is powerful. Because the model generates these facts from its own memory, it can easily make things up. If it hallucinates even one stepping-stone fact along the way, the final answer is highly likely to be wrong.

Generative self-retrieval is a mechanism by which a reasoning language model accesses knowledge stored in its own parameters by writing it out as text. During its chain-of-thought, the model generates facts that are topically related to the question, and those generated facts serve as context that conditions the final answer. The retrieval happens through generation itself: producing the related facts is what brings them into the model's working context, where they raise the probability of recalling the correct answer.

The term was introduced by Gekhman et al. (2026) in "Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs." It draws on the cognitive idea of spreading activation, where processing one concept lowers the retrieval threshold for related concepts in a semantic network. A reasoning model appears to do something analogous: by recalling neighboring facts, it builds a contextual bridge toward an answer that would otherwise stay out of reach. The paper supports this by extracting the facts a reasoning trace surfaced, feeding them back to the model with reasoning switched off, and recovering most of the accuracy gains, which indicates the surfaced facts themselves carry the benefit. A case study makes it concrete: to name the 10th King of Nepal, the model first lists the preceding nine, and that scaffolding is what lets it reach the tenth.

The mechanism is powerful and fragile at once. Because the model generates the intermediate facts from its own parameters, those facts can be hallucinated, and traces containing hallucinated facts are substantially more likely to produce hallucinated final answers.

Concept

Mentioned in