UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Jan 4, 2024License:mitArchitecture:Transformer0.0K Open Weights Warm

UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 is a 7 billion parameter GPT-like language model developed by UCLA-AGI, fine-tuned using a self-play fine-tuning (SPIN) approach. This model is based on alignment-handbook/zephyr-7b-sft-full, which in turn is derived from Mistral-7B-v0.1, and is primarily English-language focused. It leverages synthetic data from the HuggingFaceH4/ultrachat_200k dataset to enhance its capabilities through iterative self-improvement. The model demonstrates competitive performance on various benchmarks, including ARC, HellaSwag, and MMLU, making it suitable for general language understanding and generation tasks.

Loading preview...

Model Overview

UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 is a 7 billion parameter language model developed by UCLA-AGI, utilizing a novel Self-Play Fine-Tuning (SPIN) method. This model is an iteration 0 fine-tune of the alignment-handbook/zephyr-7b-sft-full base model, which itself is built upon mistralai/Mistral-7B-v0.1. The fine-tuning process involved generating synthetic data from the HuggingFaceH4/ultrachat_200k dataset, aiming to convert a weaker language model into a stronger one through self-improvement.

Key Capabilities & Performance

This model is primarily designed for general English language understanding and generation tasks. Its performance has been evaluated on the Open LLM Leaderboard, showcasing competitive results:

  • Average Score: 62.37
  • ARC (25-shot): 63.65
  • HellaSwag (10-shot): 84.44
  • MMLU (5-shot): 61.01
  • TruthfulQA (0-shot): 50.48
  • Winogrande (5-shot): 77.98
  • GSM8K (5-shot): 36.69

Training Details

The model was trained with a learning rate of 5e-07, a batch size of 8 across 8 GPUs (total batch size 64), using the RMSProp optimizer and a linear learning rate scheduler over 2 epochs. The SPIN methodology, detailed in the paper "Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models" (arXiv:2401.01335), is the core innovation behind its development.