🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

What are position embeddings in LLMs?

Position embeddings are a mechanism used in large language models (LLMs) to encode the order of tokens in a sequence. Unlike recurrent neural networks (RNNs) or convolutional architectures, Transformers—the backbone of most LLMs—process all tokens in parallel, which means they lack inherent awareness of token positions. Position embeddings solve this by injecting information about where each token is located in the sequence. This allows the model to distinguish between sentences like “The cat sat on the mat” and “On the mat, the cat sat,” where word order changes the meaning.

There are two common types of position embeddings: absolute and learned. Absolute position embeddings assign a fixed vector to each position in the sequence. For example, the original Transformer model used sinusoidal functions to generate unique positional vectors based on mathematical patterns. Learned embeddings, used in models like BERT, treat position information as trainable parameters. During training, the model adjusts these vectors to better capture relationships between positions. For instance, in a sentence like “She didn’t go to the park because it was raining,” the model uses position embeddings to link “it” to “raining” even if they are several words apart. Some newer models also use relative position embeddings, which focus on distances between tokens rather than absolute positions. This helps handle longer sequences more effectively, as seen in architectures like T5.

Implementing position embeddings typically involves adding positional vectors to the token embeddings before feeding them into the model’s layers. For example, in code, a learned position embedding layer might look like nn.Embedding(max_length, hidden_dim), where each position index (0, 1, 2, etc.) maps to a unique vector. Challenges include handling sequences longer than the maximum position the model was trained on, which can lead to out-of-distribution errors. Some models address this by extrapolating or using techniques like Rotary Position Embeddings (RoPE), which encode positions through rotation operations. Without position embeddings, LLMs would struggle with tasks requiring syntactic structure (e.g., parsing) or context-dependent meaning (e.g., coreference resolution), making them critical for accurate language understanding.

Like the article? Spread the word

How we use cookies

This website stores cookies on your computer. By continuing to browse or by clicking ‘Accept’, you agree to the storing of cookies on your device to enhance your site experience and for analytical purposes.