argilla/notus-7b-v1
Argilla's Notus-7b-v1 is a 7 billion parameter GPT-like causal language model, fine-tuned using Direct Preference Optimization (DPO) on a curated version of the UltraFeedback dataset. This model, based on Zephyr-7b-sft-full, is optimized for chat applications and assistant-like interactions. It demonstrates competitive performance, surpassing Zephyr-7B-beta and Claude 2 on the AlpacaEval benchmark, making it suitable for high-quality conversational AI.
Loading preview...
Notus 7B v1: DPO Fine-tuned Chat Model
Notus 7B v1, developed by Argilla, is a 7 billion parameter GPT-like model fine-tuned with Direct Preference Optimization (DPO). It builds upon zephyr-7b-sft-full, the base model for zephyr-7b-beta, but distinguishes itself through a meticulously curated preference dataset. Argilla identified and rectified data quality issues within the original UltraFeedback dataset, creating a binarized version based on preference ratings rather than critique scores.
Key Capabilities & Performance
- Enhanced Chat Performance: Notus 7B v1 excels in chat-like applications, outperforming Zephyr-7B-beta and Claude 2 on the AlpacaEval benchmark with a 91.42% win rate, while maintaining comparable MT-Bench scores.
- Improved Academic Benchmarks: It shows stronger performance on the Open LLM Leaderboard, achieving a higher average score (52.89) and better results in ARC, HellaSwag, MMLU, and Winogrande compared to Zephyr 7B dDPO.
- Data-First Approach: The model's superior performance is attributed to Argilla's "data-first" strategy, focusing on high-quality, verified training data.
Training & Data Curation
Notus was trained using a new, curated version of the openbmb/UltraFeedback dataset, specifically argilla/ultrafeedback-binarized-preferences. This involved identifying and correcting mismatches between overall_score and actual response quality in the original dataset, leveraging Argilla's data annotation tools. The model primarily supports English and uses the same prompt template as HuggingFaceH4/zephyr-7b-beta.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.