abacusai/Smaug-34B-v0.1
TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Jan 25, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Cold
Smaug-34B-v0.1 is a 34 billion parameter language model developed by Abacus.AI, fine-tuned from jondurbin's bagel-34b-v0.2. It utilizes a novel fine-tuning technique called DPO-Positive (DPOP) to enhance performance across various datasets, particularly excelling in math-based tasks. This model is designed to improve upon standard DPO loss by avoiding reductions in preferred example likelihood, making it suitable for complex reasoning and general language understanding tasks.
Loading preview...
Smaug-34B-v0.1: Enhanced Performance via DPO-Positive Fine-tuning
Smaug-34B-v0.1 is a 34 billion parameter language model developed by Abacus.AI, built upon jondurbin's bagel-34b-v0.2. This model introduces a significant advancement in fine-tuning methodology through its novel DPO-Positive (DPOP) technique.
Key Capabilities & Innovations
- DPO-Positive (DPOP) Fine-tuning: Smaug-34B-v0.1 was trained using DPOP, a new loss function and training procedure designed to overcome limitations of standard DPO, especially in datasets where the edit distance between completions is low (e.g., math problems). DPOP prevents the reduction of preferred example likelihood while increasing the relative probability between preferred and dispreferred classes.
- Improved Performance: The model demonstrates strong performance across a variety of benchmarks, achieving an average score of 77.29. Specific scores include 74.23 on ARC, 86.76 on HellaSwag, 76.66 on MMLU, 70.22 on TruthfulQA, 83.66 on Winogrande, and 72.18 on GSM8K.
- Research-Backed: The DPOP technique and full training details are outlined in the paper "Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive" (arXiv:2402.13228).
Good For
- Complex Reasoning Tasks: Particularly beneficial for tasks requiring precise reasoning, such as mathematical problem-solving, where DPOP's strengths in low-edit-distance scenarios are evident.
- General Language Understanding: Its broad performance across various benchmarks suggests suitability for a wide range of natural language processing applications.
- Research and Development: Developers interested in advanced fine-tuning techniques and preference optimization can leverage this model and its associated research paper.