AWS Bedrock provides access to foundation models, but input and output limits vary by model. Input prompt length refers to how much text you can send to the model in a single request, while output length determines how much text the model generates. These limits are typically defined in tokens (chunks of text, roughly 4 characters each). For example, Anthropic’s Claude 2.1 supports up to 100,000 tokens for input, while Amazon Titan Text has a 8,192 token input limit. Output limits are often adjustable via parameters like max_tokens
, but default to model-specific caps (e.g., Claude’s default output is 4,096 tokens). These constraints ensure performance and cost efficiency, as longer sequences require more computational resources.
To find specific limits, start with the AWS Bedrock documentation. Each model’s detail page lists its token limits and configuration options. For instance, in the AWS Console, navigate to Bedrock > Model Access, select a model, and check the “Model details” section. The API reference also includes this information: the modelId
parameter in the InvokeModel
request corresponds to a model with documented constraints. Some models, like AI21 Labs’ Jurassic-2, allow shorter inputs (e.g., 8,192 tokens) but let you adjust output length via maxTokens
in the API body. Always verify if your use case requires splitting prompts or truncating outputs to stay within bounds. For example, if processing a 120k-token document with Claude, you’d need to chunk it into 100k-token segments.
Developers should test limits programmatically. For example, using the AWS SDK for Python (boto3), calling invoke_model
with an oversized prompt may return a ValidationException
. The error message often specifies the allowed token range. To avoid surprises, pre-process inputs using token-counting libraries (e.g., Anthropic’s claude-tokenizer
). Output limits can be managed by setting parameters like max_tokens
in the request—though exceeding the model’s maximum will trigger an error. If default limits are too restrictive, check AWS support options; some models allow quota increases. Always review the latest docs, as updates (like Claude 3’s 200k-token input) can change these thresholds.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word