UCLA-AGI/zephyr-7b-sft-full-SPIN-iter2

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Jan 5, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

UCLA-AGI/zephyr-7b-sft-full-SPIN-iter2 is a 7 billion parameter GPT-like language model developed by UCLA-AGI, fine-tuned using a self-play approach. This model is the second iteration of fine-tuning from alignment-handbook/zephyr-7b-sft-full, leveraging synthetic data derived from the HuggingFaceH4/ultrachat_200k dataset. It is primarily English-language and demonstrates competitive performance on various benchmarks, including an average score of 63.54 on the Open LLM Leaderboard, making it suitable for general conversational AI and instruction-following tasks.

Loading preview...

UCLA-AGI/zephyr-7b-sft-full-SPIN-iter2 Overview

This model is a 7 billion parameter language model developed by UCLA-AGI, representing the second iteration of a self-play fine-tuning (SPIN) process. It builds upon the alignment-handbook/zephyr-7b-sft-full base model, which itself is derived from mistralai/Mistral-7B-v0.1. The fine-tuning process utilizes synthetic data generated from the HuggingFaceH4/ultrachat_200k dataset, aiming to convert a weaker language model into a stronger one through iterative self-improvement.

Key Capabilities & Performance

  • Self-Play Fine-Tuning (SPIN): Employs an advanced training methodology detailed in the paper "Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models" (arXiv:2401.01335).
  • Benchmark Performance: Achieves an average score of 63.54 on the Open LLM Leaderboard. Notable scores include:
    • ARC (25-shot): 66.47
    • HellaSwag (10-shot): 85.82
    • MMLU (5-shot): 61.48
    • TruthfulQA (0-shot): 57.75
    • Winogrande (5-shot): 76.95
    • GSM8K (5-shot): 32.75
  • Language Support: Primarily English.
  • Context Length: Supports a context length of 8192 tokens.

Training Details

The model was trained with a learning rate of 1e-07, a batch size of 8 across 8 GPUs (total batch size 64), using the RMSProp optimizer and a linear learning rate scheduler over 2 epochs. This iterative fine-tuning approach, leveraging synthetic data, is a key differentiator in its development.