🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • How do I implement conversation history in OpenAI’s GPT models?

How do I implement conversation history in OpenAI’s GPT models?

To implement conversation history in OpenAI’s GPT models, you need to structure your API requests to include prior interactions as part of the input. The key is to maintain a list of messages, where each message has a role (like “user” or “assistant”) and content (the actual text). By appending each exchange to this list and sending it with every new request, the model can reference the context to generate coherent, context-aware responses. For example, if a user asks, “What’s the capital of France?” and the assistant replies, “Paris,” the next user query (“What’s its population?”) requires the model to know “its” refers to Paris. Including the prior messages ensures the model understands the context.

Managing token limits is critical. Each GPT model has a maximum token limit (e.g., 4,096 tokens for gpt-3.5-turbo or 128,000 for gpt-4-0125-preview). Tokens include every word, punctuation mark, and whitespace in both input and output. If the conversation exceeds the limit, you’ll need to truncate or omit older messages. A common approach is to use a sliding window: keep the most recent messages and discard the oldest ones. For example, you might track token counts using OpenAI’s tiktoken library and remove messages starting from the beginning until the total tokens fit within the limit. Alternatively, you could summarize older parts of the conversation to preserve key details without using excessive tokens.

Here’s a practical example in Python. Initialize a list, conversation_history, and append each user and assistant message as dictionaries with role and content keys. For each API call, pass this list to the messages parameter. After each response, add the assistant’s reply to the history. To handle token limits, calculate the token count after each interaction and trim the list when nearing the limit. For instance:

from openai import OpenAI
import tiktoken

client = OpenAI()
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

conversation_history = [
 {"role": "system", "content": "You are a helpful assistant."}
]

def add_message(role, content):
 conversation_history.append({"role": role, "content": content})

def trim_history():
 total_tokens = len(encoding.encode(str(conversation_history)))
 while total_tokens > 3000: # Example threshold
 removed_message = conversation_history.pop(1) # Skip system message
 total_tokens = len(encoding.encode(str(conversation_history)))

This ensures the model always receives enough context while staying within token constraints. Adjust thresholds and trimming logic based on your specific use case.

Like the article? Spread the word