Name: abacusai/Smaug-34B-v0.1 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: abacusai

Smaug-34B-v0.1: Enhanced Performance via DPO-Positive Fine-tuning

Smaug-34B-v0.1 is a 34 billion parameter language model developed by Abacus.AI, built upon jondurbin's bagel-34b-v0.2. This model introduces a significant advancement in fine-tuning methodology through its novel DPO-Positive (DPOP) technique.

Key Capabilities & Innovations

DPO-Positive (DPOP) Fine-tuning: Smaug-34B-v0.1 was trained using DPOP, a new loss function and training procedure designed to overcome limitations of standard DPO, especially in datasets where the edit distance between completions is low (e.g., math problems). DPOP prevents the reduction of preferred example likelihood while increasing the relative probability between preferred and dispreferred classes.
Improved Performance: The model demonstrates strong performance across a variety of benchmarks, achieving an average score of 77.29. Specific scores include 74.23 on ARC, 86.76 on HellaSwag, 76.66 on MMLU, 70.22 on TruthfulQA, 83.66 on Winogrande, and 72.18 on GSM8K.
Research-Backed: The DPOP technique and full training details are outlined in the paper "Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive" (arXiv:2402.13228).

Good For

Complex Reasoning Tasks: Particularly beneficial for tasks requiring precise reasoning, such as mathematical problem-solving, where DPOP's strengths in low-edit-distance scenarios are evident.
General Language Understanding: Its broad performance across various benchmarks suggests suitability for a wide range of natural language processing applications.
Research and Development: Developers interested in advanced fine-tuning techniques and preference optimization can leverage this model and its associated research paper.

Overview

Smaug-34B-v0.1: Enhanced Performance via DPO-Positive Fine-tuning

Key Capabilities & Innovations

Good For

Full Model Card (README)