transformers-community/dola

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Aug 21, 2025Architecture:Transformer0.0K Warm

transformers-community/dola is an implementation of Decoding by Contrasting Layers (DoLa), a contrastive decoding strategy applied to the Qwen/Qwen3-0.6B base model. This 0.8 billion parameter model with a 32768 token context length enhances factuality and reduces hallucinations by contrasting logits from the final layer with those from earlier layers. It is particularly effective for short-answer tasks using higher layers and long-answer reasoning tasks using lower layers, making it suitable for improving output reliability in specific generative AI applications.

Loading preview...

DoLa: Decoding by Contrasting Layers

This model implements the Decoding by Contrasting Layers (DoLa) strategy, a technique designed to improve the factuality and reduce hallucinations in language model outputs. DoLa operates by contrasting the logits from the final layer of a language model with those from earlier layers, effectively amplifying factual knowledge and suppressing spurious information.

Key Capabilities

  • Enhanced Factual Accuracy: Reduces hallucinations by leveraging layer-specific knowledge.
  • Configurable Layer Contrast: Allows selection of 'low' layers (for long-answer reasoning) or 'high' layers (for short-answer tasks), or specific layer indices.
  • Repetition Control: Supports an optional repetition_penalty to further refine output quality.
  • Base Model: Built upon the Qwen/Qwen3-0.6B architecture.

Good For

  • Short-answer tasks: Such as TruthfulQA, where precise and factual responses are critical.
  • Long-answer reasoning tasks: Including benchmarks like GSM8K, StrategyQA, FACTOR, and VicunaQA, benefiting from deeper contextual contrast.
  • Improving reliability: For applications where factual correctness and reduced hallucination are paramount, especially in smaller to medium-sized decoder-only transformer models.

DoLa is not recommended for very small models like GPT-2, as the performance gains may be minimal. This implementation matches the DoLa functionality found in transformers<4.53.0.