SakanaAI/DiscoPOP-zephyr-7b-gemma

Cold
Public
8.5B
FP8
8192
License: gemma
Hugging Face
Overview

SakanaAI DiscoPOP-zephyr-7b-gemma: A Novel Preference Optimized LLM

This model, developed by SakanaAI, is an 8.5 billion parameter language model based on the Gemma architecture, specifically fine-tuned from HuggingFaceH4/zephyr-7b-gemma-sft-v0.1. Its core differentiator lies in its use of DiscoPOP (Discovered Preference Optimization), an algorithm developed by SakanaAI, as an alternative to traditional Direct Preference Optimization (DPO).

Key Capabilities & Features

  • Novel Optimization Algorithm: Employs DiscoPOP, a unique preference optimization method, for fine-tuning, as detailed in the paper "Discovering Preference Optimization Algorithms with and for Large Language Models".
  • Base Model: Built upon the robust zephyr-7b-gemma-sft-v0.1 foundation.
  • Context Length: Supports an 8192-token context window.
  • Training Details: Fine-tuned over 2 epochs with a learning rate of 5e-07 and a total batch size of 128, using Adam optimizer with cosine learning rate scheduler.

When to Consider This Model

  • Exploring Advanced Preference Optimization: Ideal for researchers and developers interested in evaluating or utilizing novel preference optimization techniques beyond DPO.
  • General Language Tasks: Suitable for a wide range of applications where a 7B-class model with strong instruction following capabilities is required.
  • Reproducibility and Research: The associated paper and codebase provide transparency for research and experimentation.