SakanaAI/DiscoPOP-zephyr-7b-gemma

TEXT GENERATIONConcurrency Cost:1Model Size:8.5BQuant:FP8Ctx Length:8kPublished:Jun 12, 2024License:gemmaArchitecture:Transformer0.0K Cold

SakanaAI's DiscoPOP-zephyr-7b-gemma is an 8.5 billion parameter language model, fine-tuned from HuggingFaceH4/zephyr-7b-gemma-sft-v0.1 with an 8192-token context length. This model distinguishes itself by utilizing DiscoPOP, a novel Discovered Preference Optimization algorithm, instead of standard Direct Preference Optimization (DPO). It is designed for general language tasks, leveraging its unique optimization method for improved performance.

Loading preview...

SakanaAI DiscoPOP-zephyr-7b-gemma: A Novel Preference Optimized LLM

This model, developed by SakanaAI, is an 8.5 billion parameter language model based on the Gemma architecture, specifically fine-tuned from HuggingFaceH4/zephyr-7b-gemma-sft-v0.1. Its core differentiator lies in its use of DiscoPOP (Discovered Preference Optimization), an algorithm developed by SakanaAI, as an alternative to traditional Direct Preference Optimization (DPO).

Key Capabilities & Features

  • Novel Optimization Algorithm: Employs DiscoPOP, a unique preference optimization method, for fine-tuning, as detailed in the paper "Discovering Preference Optimization Algorithms with and for Large Language Models".
  • Base Model: Built upon the robust zephyr-7b-gemma-sft-v0.1 foundation.
  • Context Length: Supports an 8192-token context window.
  • Training Details: Fine-tuned over 2 epochs with a learning rate of 5e-07 and a total batch size of 128, using Adam optimizer with cosine learning rate scheduler.

When to Consider This Model

  • Exploring Advanced Preference Optimization: Ideal for researchers and developers interested in evaluating or utilizing novel preference optimization techniques beyond DPO.
  • General Language Tasks: Suitable for a wide range of applications where a 7B-class model with strong instruction following capabilities is required.
  • Reproducibility and Research: The associated paper and codebase provide transparency for research and experimentation.