macadeliccc/MBX-7B-v3-DPO

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Jan 30, 2024License:ccArchitecture:Transformer0.0K Cold

macadeliccc/MBX-7B-v3-DPO is a 7 billion parameter causal language model, fine-tuned from flemmingmiguel/MBX-7B-v3 using Direct Preference Optimization (DPO). This model is optimized for conversational tasks and general instruction following, demonstrating improved performance over its base model on benchmarks like EQ-Bench and the Open LLM Leaderboard. With a context length of 8192 tokens, it is suitable for applications requiring nuanced responses and enhanced truthfulness.

Loading preview...

MBX-7B-v3-DPO Overview

MBX-7B-v3-DPO is a 7 billion parameter language model developed by macadeliccc, built upon the flemmingmiguel/MBX-7B-v3 base model. It has been further refined using Direct Preference Optimization (DPO) with the jondurbin/truthy-dpo-v0.1 dataset, aiming to enhance its instruction-following capabilities and response quality.

Key Capabilities & Performance

  • Improved Instruction Following: The DPO fine-tuning process has led to a notable improvement in conversational quality, as indicated by its EQ-Bench v2 score of 74.32, surpassing the base model's 73.87.
  • Benchmark Performance: On the Open LLM Leaderboard, MBX-7B-v3-DPO achieves an average score of 76.13, with strong results in:
    • HellaSwag (10-Shot): 89.11
    • Winogrande (5-shot): 85.56
    • TruthfulQA (0-shot): 74.00
    • GSM8k (5-shot): 69.67
  • Context Length: Supports an 8192-token context window, allowing for processing longer prompts and generating more extensive responses.

Deployment Options

  • Quantized Versions: Available in various quantized formats, including GGUF and Exllamav2, offering flexibility for deployment on hardware with limited VRAM. Exllamav2 quantizations range from 8.0-bit (8.4 GB VRAM) down to 3.5-bit (4.7 GB VRAM), with the 6.5-bit version recommended for a balance of quality and size.

Good for

  • General Chat Applications: Its DPO fine-tuning makes it well-suited for engaging in conversational AI and instruction-based tasks.
  • Truthful and Coherent Responses: The focus on truthfulness during DPO training suggests improved factual accuracy and reduced hallucination.
  • Resource-Constrained Environments: The availability of various quantized versions makes it adaptable for deployment on consumer-grade GPUs.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p