To implement conversation history in OpenAI’s GPT models, you need to structure your API requests to include prior interactions as part of the input. The key is to maintain a list of messages, where each message has a role (like “user” or “assistant”) and content (the actual text). By appending each exchange to this list and sending it with every new request, the model can reference the context to generate coherent, context-aware responses. For example, if a user asks, “What’s the capital of France?” and the assistant replies, “Paris,” the next user query (“What’s its population?”) requires the model to know “its” refers to Paris. Including the prior messages ensures the model understands the context.
Managing token limits is critical. Each GPT model has a maximum token limit (e.g., 4,096 tokens for gpt-3.5-turbo
or 128,000 for gpt-4-0125-preview
). Tokens include every word, punctuation mark, and whitespace in both input and output. If the conversation exceeds the limit, you’ll need to truncate or omit older messages. A common approach is to use a sliding window: keep the most recent messages and discard the oldest ones. For example, you might track token counts using OpenAI’s tiktoken
library and remove messages starting from the beginning until the total tokens fit within the limit. Alternatively, you could summarize older parts of the conversation to preserve key details without using excessive tokens.
Here’s a practical example in Python. Initialize a list, conversation_history
, and append each user and assistant message as dictionaries with role
and content
keys. For each API call, pass this list to the messages
parameter. After each response, add the assistant’s reply to the history. To handle token limits, calculate the token count after each interaction and trim the list when nearing the limit. For instance:
from openai import OpenAI
import tiktoken
client = OpenAI()
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
conversation_history = [
{"role": "system", "content": "You are a helpful assistant."}
]
def add_message(role, content):
conversation_history.append({"role": role, "content": content})
def trim_history():
total_tokens = len(encoding.encode(str(conversation_history)))
while total_tokens > 3000: # Example threshold
removed_message = conversation_history.pop(1) # Skip system message
total_tokens = len(encoding.encode(str(conversation_history)))
This ensures the model always receives enough context while staying within token constraints. Adjust thresholds and trimming logic based on your specific use case.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word