Temperature in large language models (LLMs) is a parameter that controls the randomness of the model’s output during text generation. It influences how the model selects the next token (word or subword) in a sequence. A lower temperature (closer to 0) makes the model more deterministic, favoring high-probability tokens, while a higher temperature (above 1) increases randomness, allowing less likely tokens to be chosen. This parameter does not alter the model’s underlying knowledge but adjusts the balance between predictability and creativity in responses.
Technically, temperature works by scaling the logits (raw output scores) before applying the softmax function, which converts these scores into probabilities. For example, with a temperature of 1, the logits are used as-is. A lower temperature (e.g., 0.5) divides the logits by the temperature value, amplifying differences between scores and making the highest-scoring token more dominant. Conversely, a higher temperature (e.g., 2) reduces these differences, flattening the probability distribution. For instance, if the model’s top tokens for completing “The sky is” are “blue” (80%) and “green” (20%), a low temperature might output “blue” 99% of the time, while a high temperature could increase the chance of “green” to 40%, introducing more variability.
Developers adjust temperature based on the task. For factual or code-generation tasks (e.g., answering technical questions), a low temperature (0.2–0.5) ensures precise, reliable outputs. In creative writing or brainstorming, a higher temperature (0.7–1.2) produces diverse ideas. However, overly high temperatures risk incoherence, while overly low ones lead to repetitive or generic text. For example, a chatbot might use temperature 0.3 for technical support (to stay factual) but switch to 0.9 for casual conversation (to sound engaging). Experimentation is key: temperature 0 forces greedy sampling (always picking the top token), but even small values like 0.3 can balance creativity and focus effectively.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word