mlabonne/NeuralDaredevil-7B
mlabonne/NeuralDaredevil-7B is a 7 billion parameter language model, fine-tuned using Direct Preference Optimization (DPO) on the argilla/distilabel-intel-orca-dpo-pairs dataset. This model is an instruction-tuned variant of mlabonne/Daredevil-7B, designed to enhance response quality through preference learning. It demonstrates competitive performance on benchmarks like Nous suite and the Open LLM Leaderboard, making it suitable for general-purpose conversational AI and instruction-following tasks.
Loading preview...
NeuralDaredevil-7B: DPO Fine-tune for Enhanced Instruction Following
NeuralDaredevil-7B is a 7 billion parameter language model developed by mlabonne, created by applying Direct Preference Optimization (DPO) to the existing mlabonne/Daredevil-7B model. The DPO fine-tuning utilized the argilla/distilabel-intel-orca-dpo-pairs preference dataset, aiming to improve the model's ability to follow instructions and generate high-quality responses.
Key Capabilities & Performance
- DPO Fine-tuning: Leverages preference data to align model outputs more closely with human preferences.
- Competitive Benchmarking: Achieves an average score of 59.39 on the Nous suite (AGIEval: 45.23, GPT4All: 76.2, TruthfulQA: 67.61, Bigbench: 48.52), positioning it favorably against similar 7B models like mlabonne/Beagle14-7B and argilla/distilabeled-Marcoro14-7B-slerp.
- Open LLM Leaderboard: Records an average score of 74.12, with strong results in HellaSwag (87.62), Winogrande (82.08), and GSM8k (73.16).
- Instruction Following: Uses the same prompt template as
mistralai/Mistral-7B-Instruct-v0.2, ensuring compatibility with established instruction formats.
Good For
- General-purpose instruction-following applications.
- Conversational AI where response quality and alignment are crucial.
- Developers seeking a DPO-tuned 7B model with solid benchmark performance.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.