🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz

How do zero-shot learning models leverage semantic knowledge?

Zero-shot learning (ZSL) models leverage semantic knowledge by connecting features learned from training data to descriptions of unseen classes through shared attributes, textual embeddings, or hierarchical relationships. These models bridge the gap between seen and unseen classes using semantic information that describes both. For example, if a model is trained to recognize animals like “horse” and “zebra,” it might learn that stripes are a key attribute of zebras. When asked to classify a “tiger” (an unseen class), the model infers that stripes, combined with other attributes like feline features, map to the new class. This approach relies on semantic representations—such as word vectors, attribute lists, or knowledge graphs—to generalize beyond the training data.

A common technical implementation involves embedding both input data (e.g., images or text) and class descriptions into a shared semantic space. For instance, in image classification, a ZSL model might project visual features (extracted by a convolutional neural network) into the same space as word vectors (like Word2Vec or GloVe) representing class names. During training, the model learns to align visual patterns with semantic vectors of seen classes. At inference time, it compares the projected features of an unseen input to the semantic vectors of all possible classes, even those never encountered. For example, a model trained on “dog” and “cat” could recognize “wolf” by matching its visual features to the semantic vector for “wolf,” which is closer to “dog” in the embedding space due to shared traits like “canine” or “pack animal.”

However, ZSL performance heavily depends on the quality and relevance of the semantic data. If the semantic embeddings lack meaningful relationships between seen and unseen classes (e.g., “penguin” and “sparrow” not sharing “flightless” as an attribute), the model will struggle. To address this, some frameworks combine multiple semantic sources, such as using both WordNet hierarchies and manually defined attributes. In practice, developers might use pre-trained language models like BERT to generate richer class descriptions or fine-tune attribute detectors for specific domains. For example, a medical ZSL system could use ontologies like SNOMED-CT to classify rare diseases by linking symptoms (observed features) to disease descriptions (unseen classes) through shared semantic relationships.

Like the article? Spread the word