To set parameters like maximum tokens, temperature, or top-p when using a model via AWS Bedrock, you configure these options in the API request body when invoking the model. Bedrock provides access to various foundation models, and each model provider (e.g., Anthropic, AI21, Cohere) may use slightly different parameter names or value ranges. For example, when using Anthropic’s Claude model, you’d include parameters like max_tokens_to_sample
, temperature
, and top_p
in the JSON payload sent to the InvokeModel
API. These settings are specified under the model-specific configuration section of the request, allowing you to control output length, randomness, and token selection.
Each parameter serves a distinct purpose. max_tokens
(or similar variants like max_tokens_to_sample
) limits the length of the generated response. For instance, setting max_tokens_to_sample: 200
ensures the model stops after producing 200 tokens. temperature
adjusts randomness: lower values (e.g., 0.2) make outputs more predictable, while higher values (e.g., 0.8) encourage creativity. top_p
, or nucleus sampling, restricts token selection to a cumulative probability threshold. For example, top_p: 0.9
means the model considers only tokens that make up the top 90% of probability mass. These parameters often work together—using a lower temperature
with a moderate top_p
(e.g., 0.7) can balance focus and variety. Note that some providers recommend using either temperature
or top_p
, not both, to avoid conflicting behaviors.
When implementing these settings, consult the specific model’s documentation for exact parameter names and ranges. For example, AI21 Jurassic-2 uses maxTokens
and temperature
, while Cohere Command requires max_tokens
and temperature
with a 0–5 range. Testing small adjustments is key: a temperature
of 0.5 might work for factual Q&A, while 1.0 could suit creative storytelling. Avoid setting max_tokens
too low (e.g., 50), which might truncate responses, or too high (e.g., 1000), which risks unnecessary costs. Always validate outputs with different configurations to align with your use case, such as using temperature: 0.3
for code generation to prioritize accuracy over novelty.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word