What is Contrastive Search?

Contrastive Search is a decoding strategy designed to improve the quality of text generated by transformer models. Unlike traditional greedy or beam search, it jointly optimizes for two key factors:

Model Confidence: Ensuring the generated tokens are highly probable according to the model.
Degeneration Penalty: Actively reducing repetition and promoting diversity by penalizing tokens that are too similar to the existing context.

This is achieved by considering the top_k candidate tokens at each step and selecting the one that maximizes a score combining p(v | context) (model confidence) and a cosine-similarity-based penalty on h_v (candidate token embedding) and H_context (context embeddings), weighted by penalty_alpha.

Key Capabilities

Reduces Repetition: Significantly minimizes repetitive phrases and tokens compared to standard decoding methods.
Preserves Semantic Coherence: Maintains the overall meaning and flow of the generated text.
Flexible Integration: Compatible with both decoder and encoder-decoder transformer models for causal language modeling.

How to Use

To use Contrastive Search, you can specify custom_generate="contrastive_search" during the model.generate() call, along with two main parameters:

top_k (int): Defines the number of candidate tokens to evaluate at each step (e.g., 4).
penalty_alpha (float): Controls the weight of the degeneration penalty, typically ranging from 0.3 to 0.8. Setting it to 0.0 effectively reverts to greedy search.

Larger top_k values explore more options but increase computational cost. This strategy is particularly useful for generating more natural and less repetitive long-form text.

Overview

What is Contrastive Search?

Key Capabilities

How to Use

Full Model Card (README)