macadeliccc/MBX-7B-v3-DPO
macadeliccc/MBX-7B-v3-DPO is a 7 billion parameter causal language model, fine-tuned from flemmingmiguel/MBX-7B-v3 using Direct Preference Optimization (DPO). This model is optimized for conversational tasks and general instruction following, demonstrating improved performance over its base model on benchmarks like EQ-Bench and the Open LLM Leaderboard. With a context length of 8192 tokens, it is suitable for applications requiring nuanced responses and enhanced truthfulness.
Loading preview...
MBX-7B-v3-DPO Overview
MBX-7B-v3-DPO is a 7 billion parameter language model developed by macadeliccc, built upon the flemmingmiguel/MBX-7B-v3 base model. It has been further refined using Direct Preference Optimization (DPO) with the jondurbin/truthy-dpo-v0.1 dataset, aiming to enhance its instruction-following capabilities and response quality.
Key Capabilities & Performance
- Improved Instruction Following: The DPO fine-tuning process has led to a notable improvement in conversational quality, as indicated by its EQ-Bench v2 score of 74.32, surpassing the base model's 73.87.
- Benchmark Performance: On the Open LLM Leaderboard, MBX-7B-v3-DPO achieves an average score of 76.13, with strong results in:
- HellaSwag (10-Shot): 89.11
- Winogrande (5-shot): 85.56
- TruthfulQA (0-shot): 74.00
- GSM8k (5-shot): 69.67
- Context Length: Supports an 8192-token context window, allowing for processing longer prompts and generating more extensive responses.
Deployment Options
- Quantized Versions: Available in various quantized formats, including GGUF and Exllamav2, offering flexibility for deployment on hardware with limited VRAM. Exllamav2 quantizations range from 8.0-bit (8.4 GB VRAM) down to 3.5-bit (4.7 GB VRAM), with the 6.5-bit version recommended for a balance of quality and size.
Good for
- General Chat Applications: Its DPO fine-tuning makes it well-suited for engaging in conversational AI and instruction-based tasks.
- Truthful and Coherent Responses: The focus on truthfulness during DPO training suggests improved factual accuracy and reduced hallucination.
- Resource-Constrained Environments: The availability of various quantized versions makes it adaptable for deployment on consumer-grade GPUs.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.