Overview
jondurbin/bagel-dpo-7b-v0.4 is a 7 billion parameter language model built upon Mistral-7B-v0.1, enhanced through Direct Preference Optimization (DPO). This model distinguishes itself by its extensive and diverse training data, which includes a wide range of SFT (Supervised Fine-Tuning) and DPO datasets. The training methodology involved converting each instruction into multiple prompt formats (Vicuna, Llama-2, Alpaca, ChatML) to improve generalization across different interaction styles.
Key Capabilities
- Diverse Instruction Following: Trained on a multitude of datasets covering reasoning, math, coding, reading comprehension, and more.
- Advanced DPO Training: Utilizes various DPO datasets, including those for creative writing, contextual understanding, truthfulness, and even de-censorship for academic purposes.
- Multi-format Prompting: Supports Llama-2 (recommended default), Alpaca, Vicuna, and ChatML formats, offering flexibility for integration.
- Specialized Prompting Strategies: Includes unique formats for:
- Context-obedient question answering (RAG): Designed to answer questions strictly from provided context, minimizing hallucinations.
- Summarization: Optimized for generating concise summaries from input text.
- Function Calling: Supports two distinct formats for tool use and API interaction.
- Chain of Thought: Enables the model to propose, reason through, and select optimal answers for complex problems.
- Creative Writing: Capable of generating character cards, conversational memories, and novel-style chapters.
- Boolean Questions & SQL Generation: Handles true/false statements and generates SQL queries from table definitions.
- Emotion Detection: Provides Valence-Arousal-Dominance (VAD) scores for text.
Good for
- Developers requiring a versatile 7B model for a broad spectrum of generative and analytical tasks.
- Applications needing robust instruction following and specialized capabilities like RAG or function calling.
- Creative writing, roleplay, and conversational AI scenarios benefiting from diverse training data.
- Use cases where adaptability to different prompt formats is crucial.