Claude Opus 4.5 itself does not expose an internal counter for remaining tokens; token accounting is ultimately your application’s responsibility. What Opus 4.5 does provide is a large context window (up to about 200,000 tokens of combined input + output) and internal mechanisms that make long-context use more practical. Anthropic’s context window docs describe a standard sliding model: each message and response adds tokens, and once the window is full, older content must be summarized or dropped.
From the API side, you control usage with parameters like max_tokens (maximum output tokens) and by deciding how much past conversation you send each turn. The platform does not automatically trim history for you; you typically maintain a conversation buffer in your own code. When that buffer approaches your chosen budget (for example, 150k tokens in a 200k window to leave headroom for output), you can summarize older turns or move them into external memory. Some SDKs include token estimation helpers; otherwise you can use model-compatible tokenizers server-side to count before sending. Anthropic’s docs emphasize that the full 200k window is available, but pricing and latency will grow with token use.
A common pattern for long-running agents is to use hierarchical memory: keep only the last few steps plus a condensed summary in the active context, and store detailed history (documents, previous plans, logs) in an external system. A vector database such as Milvus or Zilliz Cloud works well here: you embed past turns, decisions, or documents and retrieve only what’s relevant for each new call. That way, Opus 4.5 sees a focused slice of history each time, but the agent still has access to a much larger “lifetime memory” when needed. Practically, this gives you effective multi-session continuity without constantly burning 200k tokens per request.