Overview
SakanaAI DiscoPOP-zephyr-7b-gemma: A Novel Preference Optimized LLM
This model, developed by SakanaAI, is an 8.5 billion parameter language model based on the Gemma architecture, specifically fine-tuned from HuggingFaceH4/zephyr-7b-gemma-sft-v0.1. Its core differentiator lies in its use of DiscoPOP (Discovered Preference Optimization), an algorithm developed by SakanaAI, as an alternative to traditional Direct Preference Optimization (DPO).
Key Capabilities & Features
- Novel Optimization Algorithm: Employs DiscoPOP, a unique preference optimization method, for fine-tuning, as detailed in the paper "Discovering Preference Optimization Algorithms with and for Large Language Models".
- Base Model: Built upon the robust
zephyr-7b-gemma-sft-v0.1foundation. - Context Length: Supports an 8192-token context window.
- Training Details: Fine-tuned over 2 epochs with a learning rate of 5e-07 and a total batch size of 128, using Adam optimizer with cosine learning rate scheduler.
When to Consider This Model
- Exploring Advanced Preference Optimization: Ideal for researchers and developers interested in evaluating or utilizing novel preference optimization techniques beyond DPO.
- General Language Tasks: Suitable for a wide range of applications where a 7B-class model with strong instruction following capabilities is required.
- Reproducibility and Research: The associated paper and codebase provide transparency for research and experimentation.