🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • How does a zero-shot learning model predict outputs for unseen classes?

How does a zero-shot learning model predict outputs for unseen classes?

Zero-shot learning (ZSL) enables models to predict outputs for classes they were not explicitly trained on by leveraging semantic relationships between seen and unseen categories. Instead of relying solely on labeled examples for every possible class, ZSL models use auxiliary information—such as textual descriptions, attributes, or embeddings—to generalize to new categories. This approach works by aligning input data (e.g., images or text) with semantic representations of both known and unknown classes, allowing the model to infer connections even when direct training data is absent.

During training, a ZSL model learns a mapping between input features (like image pixels or word tokens) and a semantic space that describes classes. For example, a model trained to recognize animals might learn that “stripes” and “four legs” are attributes linked to tigers. These attributes are often encoded as vectors, such as word embeddings from tools like Word2Vec or manually defined attribute lists. The model is optimized to align input features with these semantic vectors. Crucially, the semantic space also includes descriptors for unseen classes (e.g., “zebra”), even though their specific input examples are unavailable. This setup forces the model to generalize by understanding how features correlate with semantic properties rather than memorizing class-specific patterns.

At inference time, the model projects input data into the same semantic space and compares it to the vectors of all possible classes, including unseen ones. For instance, if the model encounters an image of a zebra (an unseen class), it might detect “stripes” and “four legs” in the input features. By matching these features to the semantic vector for "zebra"—which shares attributes with “tiger” but differs in color or habitat—the model correctly classifies the input. Similarly, in text classification, a ZSL model trained on news topics like “sports” and “politics” could categorize an article about “climate change” by relating its content to embeddings for “environment” or “science.” This semantic alignment, combined with a well-structured feature space, allows ZSL models to handle unseen classes effectively without retraining.

Like the article? Spread the word