transformers-community/dola
transformers-community/dola is an implementation of Decoding by Contrasting Layers (DoLa), a contrastive decoding strategy applied to the Qwen/Qwen3-0.6B base model. This 0.8 billion parameter model with a 32768 token context length enhances factuality and reduces hallucinations by contrasting logits from the final layer with those from earlier layers. It is particularly effective for short-answer tasks using higher layers and long-answer reasoning tasks using lower layers, making it suitable for improving output reliability in specific generative AI applications.
Loading preview...
DoLa: Decoding by Contrasting Layers
This model implements the Decoding by Contrasting Layers (DoLa) strategy, a technique designed to improve the factuality and reduce hallucinations in language model outputs. DoLa operates by contrasting the logits from the final layer of a language model with those from earlier layers, effectively amplifying factual knowledge and suppressing spurious information.
Key Capabilities
- Enhanced Factual Accuracy: Reduces hallucinations by leveraging layer-specific knowledge.
- Configurable Layer Contrast: Allows selection of 'low' layers (for long-answer reasoning) or 'high' layers (for short-answer tasks), or specific layer indices.
- Repetition Control: Supports an optional
repetition_penaltyto further refine output quality. - Base Model: Built upon the Qwen/Qwen3-0.6B architecture.
Good For
- Short-answer tasks: Such as TruthfulQA, where precise and factual responses are critical.
- Long-answer reasoning tasks: Including benchmarks like GSM8K, StrategyQA, FACTOR, and VicunaQA, benefiting from deeper contextual contrast.
- Improving reliability: For applications where factual correctness and reduced hallucination are paramount, especially in smaller to medium-sized decoder-only transformer models.
DoLa is not recommended for very small models like GPT-2, as the performance gains may be minimal. This implementation matches the DoLa functionality found in transformers<4.53.0.