🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • Can Sentence Transformers be applied to detect changes in meaning over time, for example by comparing how similar documents from different time periods are to each other?

Can Sentence Transformers be applied to detect changes in meaning over time, for example by comparing how similar documents from different time periods are to each other?

Yes, Sentence Transformers can be used to detect changes in meaning over time by comparing the semantic similarity of documents from different periods. These models generate dense vector representations (embeddings) of text, which capture semantic meaning. By measuring the similarity between embeddings of documents written in different eras, you can quantify shifts in language use, context, or conceptual associations. For example, a document from the 1990s discussing “artificial intelligence” might embed differently than a 2020s document on the same topic, reflecting changes in technical scope or societal perceptions.

To implement this, you’d first generate embeddings for documents grouped by time period (e.g., decade). Using a pre-trained model like all-mpnet-base-v2, you could encode texts into vectors and compute cosine similarity between document pairs across periods. A decline in similarity scores over time could indicate semantic drift. For instance, comparing medical articles from the 1980s and 2020s might reveal shifts in terminology (e.g., “AIDS” vs. “HIV/AIDS”) or changes in recommended treatments. However, this approach assumes the model can generalize to older language and contexts, which may not always hold if training data is skewed toward modern text.

Practical considerations include model selection and preprocessing. Models trained on diverse historical data (e.g., bert-base-cased) may better capture archaic language than those trained on modern corpora. Fine-tuning on time-specific data could improve accuracy. Additionally, domain matters: analyzing legal texts might require a different approach than social media posts. Tools like the sentence-transformers library simplify embedding generation, while dimensionality reduction (e.g., UMAP) can help visualize clusters of documents over time. However, this method doesn’t explain why shifts occur—it quantifies similarity but requires domain expertise to interpret causes (e.g., technological advancements or cultural changes).

Like the article? Spread the word