Devi 7B: A DPO-Optimized 7B Assistant Model
Devi 7B is a 7 billion parameter language model, a fork of Zephyr-7B-β, which itself is a fine-tuned version of mistralai/Mistral-7B-v0.1. It was developed by papahawk with significant contributions from HuggingFaceH4's work on Zephyr. The model is primarily English-language and is licensed under MIT.
Key Capabilities & Training:
- Assistant-Oriented: Trained to function as a helpful assistant.
- Direct Preference Optimization (DPO): Fine-tuned using DPO on a mix of publicly available, synthetic datasets, including a filtered version of
UltraChat and openbmb/UltraFeedback. - Performance Focus: The training process specifically removed in-built alignment from some datasets to boost performance on benchmarks like MT-Bench.
Performance Highlights:
- Top-tier 7B Chat Model: At its release, Zephyr-7B-β (the base for Devi 7B) was the highest-ranked 7B chat model on the MT-Bench and AlpacaEval benchmarks.
- MT-Bench Score: Achieved a score of 7.34 on MT-Bench, outperforming larger models like Llama2-Chat-70B in several categories.
- AlpacaEval Win Rate: Demonstrated a 90.60% win rate on AlpacaEval.
Intended Uses & Limitations:
- Chat Applications: Ideal for chat and conversational AI due to its fine-tuning on diverse synthetic dialogues.
- Potential for Problematic Outputs: Due to the removal of some safety alignments during training, the model may generate problematic text if explicitly prompted to do so.
- Complex Tasks: Lags behind proprietary models in complex tasks such as coding and mathematics.