Context Rot is the gradual degradation of a large language model’s ability to correctly use earlier context as a conversation or prompt becomes longer. In simple terms, it means that even though relevant information is still technically present in the context window, the model starts to misunderstand, ignore, or inconsistently apply it. This often shows up as the model contradicting earlier statements, forgetting constraints, or responding as if certain instructions were never given.
From a technical perspective, Context Rot is not about the model “forgetting” in a human sense. All tokens are still visible to the model within its context window. Instead, the issue comes from how attention is distributed across many tokens. As the prompt grows, earlier instructions and facts compete with newer ones. Important details that were clear and dominant early on can lose influence as more text is added, especially if later content is noisy, repetitive, or only loosely related.
Context Rot is especially noticeable in long-running chats, agent workflows, and retrieval-augmented generation (RAG) systems. For example, a system may retrieve multiple documents over time and append them all into the prompt. Even if the correct answer is present, the model may prioritize recent or more verbose content over the most relevant content. This is why many production systems use techniques like summarization, structured prompts, or external memory stored in systems such as a vector database like Milvus or Zilliz Cloud instead of endlessly growing the prompt.
For more resources, click here: https://milvus.io/blog/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md