SII-Enigma/Qwen2.5-7B-Ins-SFT-GRPO

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Sep 28, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The SII-Enigma/Qwen2.5-7B-Ins-SFT-GRPO model is a 7.6 billion parameter language model based on the Qwen2.5 architecture, fine-tuned using the AMPO (Adaptive Multi-Guidance Policy Optimization) framework. This framework leverages guidance from multiple teacher models, intervening only when the on-policy model fails, to enhance reasoning efficiency and learning effectiveness. It features Adaptive Multi-Guidance Replacement and Comprehension-based Guidance Selection, aiming for superior performance and efficiency compared to traditional RL or SFT methods. The model is designed for tasks requiring robust reasoning and efficient learning from external knowledge.

Loading preview...

Model Overview

SII-Enigma/Qwen2.5-7B-Ins-SFT-GRPO is a 7.6 billion parameter language model developed by SII-Enigma, fine-tuned with the novel AMPO (Adaptive Multi-Guidance Policy Optimization) framework. AMPO is designed to improve model performance and efficiency by intelligently integrating guidance from diverse teacher models. This approach intervenes only when the primary model encounters a failure, preserving self-discovery while enhancing reasoning capabilities.

Key Capabilities & Innovations

  • Adaptive Multi-Guidance Replacement: This mechanism minimizes external intervention, providing guidance only when the on-policy model completely fails. This strategy aims to maintain the model's ability for self-discovery while boosting reasoning efficiency.
  • Comprehension-based Guidance Selection: To maximize learning effectiveness, the framework guides the model to assimilate the most comprehensible external solutions, which has been shown to improve overall performance.
  • Superior Performance: The AMPO framework is reported to achieve better performance and efficiency compared to models trained solely with Reinforcement Learning (RL) or Supervised Fine-Tuning (SFT).

Good For

  • Applications requiring enhanced reasoning capabilities through adaptive external guidance.
  • Scenarios where efficient learning from diverse knowledge sources is critical.
  • Tasks benefiting from a model that balances self-discovery with targeted external correction.