🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

  • Home
  • AI Reference
  • Why might two different runs of the same Sentence Transformer model give slightly different embedding results (is there randomness involved, and how can I control it)?

Why might two different runs of the same Sentence Transformer model give slightly different embedding results (is there randomness involved, and how can I control it)?

Two different runs of the same Sentence Transformer model can produce slightly different embeddings due to inherent randomness in the model or its computational environment. While the model weights remain fixed after training, certain operations—like dropout layers or GPU-based matrix multiplications—may introduce variability during inference. Additionally, hardware or software optimizations (e.g., non-deterministic GPU kernels) can cause minor numerical differences even when processing identical inputs. To control this, you must identify and address the sources of randomness, enforce deterministic settings, and standardize the runtime environment.

One primary source of randomness is the use of non-deterministic GPU operations. Frameworks like PyTorch or TensorFlow often prioritize computational speed over precision, leading to slight variations in floating-point calculations across runs. For example, matrix multiplications on GPUs may use parallelized algorithms that produce marginally different results due to rounding errors. Another example is dropout layers, which are designed to randomly deactivate neurons during training. If a model is accidentally left in training mode (e.g., model.train() instead of model.eval()), dropout remains active during inference, injecting randomness. Similarly, operations like beam search or sampling in generative models can introduce variability, though Sentence Transformers typically avoid these during embedding generation.

To control randomness, start by setting deterministic configurations. In PyTorch, use torch.manual_seed(42) for all devices, disable non-deterministic GPU kernels with torch.backends.cudnn.deterministic = True, and set torch.backends.cudnn.benchmark = False to prevent auto-tuning. Ensure the model is in evaluation mode (model.eval()) to deactivate dropout and other training-specific layers. For TensorFlow, set tf.config.experimental.enable_op_determinism() and use fixed seeds for operations. Additionally, avoid mixed-precision inference (e.g., FP16) if numerical stability is critical, as reduced precision can amplify rounding differences. Finally, run the model on the same hardware and software versions consistently, as driver updates or library changes might alter computational outcomes. These steps trade slight performance overhead for reproducibility, ensuring embeddings remain consistent across runs.

Like the article? Spread the word

How we use cookies

This website stores cookies on your computer. By continuing to browse or by clicking ‘Accept’, you agree to the storing of cookies on your device to enhance your site experience and for analytical purposes.