abacusai/MetaMath-Bagel-DPO-34B
abacusai/MetaMath-Bagel-DPO-34B is a 34 billion parameter language model developed by abacusai, fine-tuned using DPO (Direct Preference Optimization) on the Truthy DPO dataset. This model is an instruction-tuned variant of the MetaMath SFT model, designed to enhance reasoning and mathematical capabilities. It demonstrates strong performance across various benchmarks, including MMLU, GSM8K, and TruthfulQA, making it suitable for complex analytical tasks.
Loading preview...
abacusai/MetaMath-Bagel-DPO-34B: DPO Fine-tuned for Enhanced Reasoning
abacusai/MetaMath-Bagel-DPO-34B is a 34 billion parameter language model built upon the MetaMath SFT (Supervised Fine-Tuning) model. It has undergone further refinement through Direct Preference Optimization (DPO) using the Truthy DPO dataset, which is designed to improve truthfulness and reasoning abilities.
Key Capabilities & Performance
This model excels in tasks requiring logical reasoning and factual accuracy, as evidenced by its evaluation results across a suite of benchmarks:
- Average Score: 75.54
- MMLU (Massive Multitask Language Understanding): 76.46
- GSM8K (Grade School Math 8K): 72.78
- TruthfulQA: 67.58
- ARC (AI2 Reasoning Challenge): 69.20
- HellaSwag: 84.34
- Winogrande: 82.87
With a context length of 32768 tokens, it can process and understand extensive inputs, making it suitable for detailed problem-solving.
Good For
- Applications requiring strong mathematical and logical reasoning.
- Tasks where factual accuracy and truthfulness are critical.
- Complex question answering and analytical workloads.