What is udkai/Garrulus?
udkai/Garrulus is a 7 billion parameter language model derived from mlabonne/NeuralMarcoro14-7B. Its unique characteristic is the application of Direct Preference Optimization (DPO) using a subtly modified Winogrande dataset. This DPO-contamination, performed over two epochs, has shown unexpected improvements in various reasoning benchmarks.
Key Capabilities & Performance:
- Enhanced Commonsense Reasoning: The model exhibits improved performance on Winogrande metrics, which are designed to test commonsense reasoning.
- Broader Reasoning Improvements: Local and leaderboard evaluations indicate that this DPO approach also boosts performance on other independent metrics, including TruthfulQA, HellaSwag, and ARC challenge.
- Leaderboard Achievement: It is noted as the first 7B model to achieve over 75% on leaderboard evaluations, suggesting strong overall performance for its size.
- Efficient Optimization: The DPO adaptation was performed efficiently, taking only a few minutes on an A40 GPU, leveraging tools like the unsloth library.
Why is this model significant?
The developers highlight that the observed performance increases across multiple reasoning benchmarks, stemming from DPO with Winogrande, have not only practical implications but also deeper theoretical significance in computer science. This suggests a novel approach to improving model reasoning through targeted, subtle data contamination during DPO.