UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0
UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 is a 7 billion parameter GPT-like language model developed by UCLA-AGI, fine-tuned using a self-play fine-tuning (SPIN) approach. This model is based on alignment-handbook/zephyr-7b-sft-full, which in turn is derived from Mistral-7B-v0.1, and is primarily English-language focused. It leverages synthetic data from the HuggingFaceH4/ultrachat_200k dataset to enhance its capabilities through iterative self-improvement. The model demonstrates competitive performance on various benchmarks, including ARC, HellaSwag, and MMLU, making it suitable for general language understanding and generation tasks.
Loading preview...
Model Overview
UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 is a 7 billion parameter language model developed by UCLA-AGI, utilizing a novel Self-Play Fine-Tuning (SPIN) method. This model is an iteration 0 fine-tune of the alignment-handbook/zephyr-7b-sft-full base model, which itself is built upon mistralai/Mistral-7B-v0.1. The fine-tuning process involved generating synthetic data from the HuggingFaceH4/ultrachat_200k dataset, aiming to convert a weaker language model into a stronger one through self-improvement.
Key Capabilities & Performance
This model is primarily designed for general English language understanding and generation tasks. Its performance has been evaluated on the Open LLM Leaderboard, showcasing competitive results:
- Average Score: 62.37
- ARC (25-shot): 63.65
- HellaSwag (10-shot): 84.44
- MMLU (5-shot): 61.01
- TruthfulQA (0-shot): 50.48
- Winogrande (5-shot): 77.98
- GSM8K (5-shot): 36.69
Training Details
The model was trained with a learning rate of 5e-07, a batch size of 8 across 8 GPUs (total batch size 64), using the RMSProp optimizer and a linear learning rate scheduler over 2 epochs. The SPIN methodology, detailed in the paper "Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models" (arXiv:2401.01335), is the core innovation behind its development.