UNA-TheBeagle-7b-v1: A DPO & UNA Fine-tuned 7B Model
UNA-TheBeagle-7b-v1 is a 7 billion parameter language model developed by fblgit, built upon Intel's neural-chat base model. It has been fine-tuned using a combination of Direct Preference Optimization (DPO) and UNA methods, applied to a curated set of DPO pairs derived from The Bagel dataset. The model notably achieved a #1 ranking on the Hugging Face Leaderboard at the time of its release, showcasing strong and balanced performance across various benchmarks.
Key Capabilities & Performance
- Strong Benchmark Scores: Achieves competitive results, including 73.29% on ARC Challenge (acc_norm), 72.10% on GSM8K (exact_match), and 87.92% on HellaSwag (acc_norm).
- DPO & UNA Fine-tuning: Leverages DPO and UNA techniques, specifically applying UNA through perceptrons at a 3.5e-7 speed, using the original Bagel training code.
- General-Purpose Utility: Designed to perform well across a variety of tasks, demonstrating good generalization capabilities.
Limitations & Usage
- Academic & Research Use Only: This model is not intended for commercial use and is restricted to academic and research purposes.
- Prompt Format Flexibility: While trained with the vanilla Bagel training code, the model is expected to generalize well with different prompt formats.