Gemma 2B Zephyr DPO Overview
The wandb/gemma-2b-zephyr-dpo is a 2.6 billion parameter language model, primarily focused on English, developed by wandb. It is a GPT-like model that has been fine-tuned using the Direct Preference Optimization (DPO) recipe, specifically the Zephyr DPO method. This model builds upon the wandb/gemma-2b-zephyr-sft base, indicating a progression from Supervised Fine-Tuning (SFT) to DPO for enhanced performance.
Key Capabilities
- Instruction Following: Optimized through DPO, making it suitable for tasks requiring precise adherence to instructions.
- Conversational AI: The Zephyr DPO recipe is known for improving conversational quality and alignment.
- Fine-tuned Gemma Base: Leverages the capabilities of the Gemma 2B model, a publicly available model from Google.
Training Details
The model was trained using the DPO script from the Hugging Face alignment handbook recipe. The training process involved approximately 13 hours on 8xA100 80GB nodes, with logging and monitoring conducted via Weights & Biases. More details on the training process can be found in the W&B workspace.
Good For
- Applications requiring a compact yet capable model for instruction-tuned tasks.
- Developing chatbots or conversational agents where alignment and preference learning are crucial.
- Researchers and developers looking for a DPO-tuned model based on the Gemma architecture.