transformers-community/contrastive-search
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Aug 25, 2025Architecture:Transformer0.0K Warm

The transformers-community/contrastive-search model implements the Contrastive Search decoding strategy, which optimizes text generation by balancing model confidence and a degeneration penalty. This method, applicable to decoder and encoder-decoder transformer models, aims to produce fluent, coherent, and low-repetition text. It is particularly effective for improving text quality in causal language models by reducing common issues like repetition found in greedy or beam search.

Loading preview...

What is Contrastive Search?

Contrastive Search is a decoding strategy designed to improve the quality of text generated by transformer models. Unlike traditional greedy or beam search, it jointly optimizes for two key factors:

  • Model Confidence: Ensuring the generated tokens are highly probable according to the model.
  • Degeneration Penalty: Actively reducing repetition and promoting diversity by penalizing tokens that are too similar to the existing context.

This is achieved by considering the top_k candidate tokens at each step and selecting the one that maximizes a score combining p(v | context) (model confidence) and a cosine-similarity-based penalty on h_v (candidate token embedding) and H_context (context embeddings), weighted by penalty_alpha.

Key Capabilities

  • Reduces Repetition: Significantly minimizes repetitive phrases and tokens compared to standard decoding methods.
  • Preserves Semantic Coherence: Maintains the overall meaning and flow of the generated text.
  • Flexible Integration: Compatible with both decoder and encoder-decoder transformer models for causal language modeling.

How to Use

To use Contrastive Search, you can specify custom_generate="contrastive_search" during the model.generate() call, along with two main parameters:

  • top_k (int): Defines the number of candidate tokens to evaluate at each step (e.g., 4).
  • penalty_alpha (float): Controls the weight of the degeneration penalty, typically ranging from 0.3 to 0.8. Setting it to 0.0 effectively reverts to greedy search.

Larger top_k values explore more options but increase computational cost. This strategy is particularly useful for generating more natural and less repetitive long-form text.