The most common approaches to few-shot learning include metric-based methods, optimization-based techniques, and model-based architectures. These strategies address the challenge of training models with limited labeled data by focusing on how to generalize effectively from small examples. Each approach offers distinct mechanisms to adapt or leverage prior knowledge for new tasks with minimal supervision.
Metric-based methods rely on comparing new examples to a small support set of labeled data. These approaches learn an embedding space where similar examples are clustered, and dissimilar ones are separated. For instance, Prototypical Networks compute a prototype (average embedding) for each class in the support set and classify new queries based on their distance to these prototypes using metrics like Euclidean or cosine distance. Similarly, Siamese Networks use pairs of examples to learn similarity scores, enabling classification by matching new inputs to the closest support examples. These methods are efficient because they avoid retraining the model for each new task, instead relying on precomputed embeddings and simple distance calculations.
Optimization-based approaches aim to train models that can adapt quickly to new tasks with minimal updates. Model-Agnostic Meta-Learning (MAML) is a prominent example: it pre-trains a model on diverse tasks, optimizing its initial parameters so that a few gradient steps on a new task’s small dataset yield strong performance. This “learning to learn” strategy involves two loops: an inner loop for task-specific fine-tuning and an outer loop for updating the initial parameters across tasks. Another variant, Reptile, simplifies this by repeatedly fine-tuning on batches of tasks and moving the initial parameters toward the fine-tuned versions. These methods are flexible but computationally intensive due to the meta-training phase.
Model-based techniques modify neural network architectures to handle few-shot scenarios dynamically. For example, memory-augmented networks like MANN incorporate external memory to store and retrieve information from past examples, enabling rapid adaptation. Transformers, with their self-attention mechanisms, can also be adapted for few-shot learning by processing support examples as part of the input sequence, allowing the model to condition predictions on context. Hypernetworks take this further by generating task-specific model parameters directly from the support set, eliminating the need for gradient-based updates. These architectures prioritize flexibility, often at the cost of increased complexity, but they excel in scenarios requiring immediate adaptation without retraining.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word