Model Overview
viethq188/Rabbit-7B-v2-DPO-Chat is a 7 billion parameter language model developed by viethq188. This model was constructed through a strategic merge of two distinct base models: AIDC-ai-business/Marcoroni-7B-v3 and Q-bert/MetaMath-Cybertron-Starling. The merging process utilized a slerp merge method, specifically configured to blend different layers and attention mechanisms from the source models.
Key Development Steps
- Base Model Merging: The initial phase involved combining
AIDC-ai-business/Marcoroni-7B-v3 and Q-bert/MetaMath-Cybertron-Starling. The config.yaml details a specific slerp merge strategy, applying varying interpolation values (t) across self-attention and MLP layers. - DPO Fine-tuning: Following the merge, the model underwent further training using Direct Preference Optimization (DPO) on Hugging Face datasets. This step is crucial for aligning the model's outputs with human preferences, enhancing its conversational quality and instruction-following capabilities.
Usage and Template
This model is designed to be used with an Alpaca-style instruction template. Users should format their prompts as follows:
{system}
### Instruction:
{prompt}
### Response:
Intended Use Cases
- Chat Applications: Optimized for generating coherent and contextually relevant responses in conversational settings.
- Instruction Following: Benefits from DPO training to better understand and execute user instructions.