viethq188/Rabbit-7B-v2-DPO-Chat
viethq188/Rabbit-7B-v2-DPO-Chat is a 7 billion parameter language model created by viethq188, built by merging AIDC-ai-business/Marcoroni-7B-v3 and Q-bert/MetaMath-Cybertron-Starling using a slerp merge method. This model was subsequently fine-tuned with DPO (Direct Preference Optimization) using Hugging Face data. It is designed for chat-based applications, leveraging its merged architecture and DPO training for improved conversational performance.
Loading preview...
Model Overview
viethq188/Rabbit-7B-v2-DPO-Chat is a 7 billion parameter language model developed by viethq188. This model was constructed through a strategic merge of two distinct base models: AIDC-ai-business/Marcoroni-7B-v3 and Q-bert/MetaMath-Cybertron-Starling. The merging process utilized a slerp merge method, specifically configured to blend different layers and attention mechanisms from the source models.
Key Development Steps
- Base Model Merging: The initial phase involved combining
AIDC-ai-business/Marcoroni-7B-v3andQ-bert/MetaMath-Cybertron-Starling. Theconfig.yamldetails a specific slerp merge strategy, applying varying interpolation values (t) across self-attention and MLP layers. - DPO Fine-tuning: Following the merge, the model underwent further training using Direct Preference Optimization (DPO) on Hugging Face datasets. This step is crucial for aligning the model's outputs with human preferences, enhancing its conversational quality and instruction-following capabilities.
Usage and Template
This model is designed to be used with an Alpaca-style instruction template. Users should format their prompts as follows:
{system}
### Instruction:
{prompt}
### Response:Intended Use Cases
- Chat Applications: Optimized for generating coherent and contextually relevant responses in conversational settings.
- Instruction Following: Benefits from DPO training to better understand and execute user instructions.