UCLA-AGI/Mistral7B-PairRM-SPPO-Iter1
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 4, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

UCLA-AGI/Mistral7B-PairRM-SPPO-Iter1 is a 7 billion parameter GPT-like language model developed by UCLA-AGI, fine-tuned using Self-Play Preference Optimization (SPPO) at its first iteration. Based on the Mistral-7B-Instruct-v0.2 architecture, this model is aligned using synthetic datasets derived from UltraFeedback prompts. It is specifically designed to demonstrate the effectiveness of the SPPO method for language model alignment, as detailed in the associated research paper.

Loading preview...