Yuqian-Fu/SRFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:mitArchitecture:Transformer0.0K Open Weights Warm

Yuqian-Fu/SRFT is a 7.6 billion parameter language model developed by Yuqian-Fu, utilizing a novel Supervised Reinforcement Fine-Tuning (SRFT) method. This single-stage fine-tuning approach unifies supervised and reinforcement learning paradigms through entropy-aware weighting mechanisms. The model is designed to leverage this integrated fine-tuning for improved performance across various language tasks.

Loading preview...

SRFT Model Overview

Yuqian-Fu/SRFT introduces a 7.6 billion parameter language model distinguished by its innovative Supervised Reinforcement Fine-Tuning (SRFT) method. This approach represents a significant departure from traditional multi-stage fine-tuning by unifying both supervised and reinforcement learning paradigms into a single, cohesive stage.

Key Differentiator

The core innovation of SRFT lies in its use of entropy-aware weighting mechanisms. These mechanisms allow the model to dynamically balance the contributions of supervised learning signals and reinforcement learning rewards during the fine-tuning process, leading to a more integrated and potentially more efficient training regimen.

Research and Development

This model is based on research detailed in the paper: arXiv:2506.19767. Further information and project details are available on the SRFT Project Website.

Potential Applications

While specific applications are not detailed in the provided README, the unified fine-tuning approach suggests potential benefits for tasks requiring robust and adaptable language understanding and generation, where both explicit supervision and iterative refinement are valuable.