udkai/Garrulus

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 9, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

udkai/Garrulus is a 7 billion parameter causal language model developed by udkai, based on mlabonne/NeuralMarcoro14-7B. This model has been intentionally optimized using Direct Preference Optimization (DPO) with a modified Winogrande dataset. It demonstrates improved performance on commonsense reasoning tasks like Winogrande, TruthfulQA, HellaSwag, and ARC challenge, making it suitable for applications requiring enhanced reasoning capabilities.

Loading preview...

What is udkai/Garrulus?

udkai/Garrulus is a 7 billion parameter language model derived from mlabonne/NeuralMarcoro14-7B. Its unique characteristic is the application of Direct Preference Optimization (DPO) using a subtly modified Winogrande dataset. This DPO-contamination, performed over two epochs, has shown unexpected improvements in various reasoning benchmarks.

Key Capabilities & Performance:

  • Enhanced Commonsense Reasoning: The model exhibits improved performance on Winogrande metrics, which are designed to test commonsense reasoning.
  • Broader Reasoning Improvements: Local and leaderboard evaluations indicate that this DPO approach also boosts performance on other independent metrics, including TruthfulQA, HellaSwag, and ARC challenge.
  • Leaderboard Achievement: It is noted as the first 7B model to achieve over 75% on leaderboard evaluations, suggesting strong overall performance for its size.
  • Efficient Optimization: The DPO adaptation was performed efficiently, taking only a few minutes on an A40 GPU, leveraging tools like the unsloth library.

Why is this model significant?

The developers highlight that the observed performance increases across multiple reasoning benchmarks, stemming from DPO with Winogrande, have not only practical implications but also deeper theoretical significance in computer science. This suggests a novel approach to improving model reasoning through targeted, subtle data contamination during DPO.