RLHFlow/Qwen2.5-7B-SFT
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 16, 2025Architecture:Transformer Cold

RLHFlow/Qwen2.5-7B-SFT is a 7.6 billion parameter instruction-tuned model based on Qwen2.5-MATH-7B-base, developed by RLHFlow. This model is specifically fine-tuned using a Supervised Fine-Tuning (SFT) approach on mathematical datasets, serving as a foundational step for further reinforcement learning methods like DPO and RAFT. It excels in mathematical reasoning tasks, demonstrating significant enhancements over its base model on benchmarks such as AIME 2024, MATH 500, and OlympiadBench.

Loading preview...