Name: SII-Enigma/Qwen2.5-7B-Ins-SFT-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SII-Enigma

Model Overview

SII-Enigma/Qwen2.5-7B-Ins-SFT-GRPO is a 7.6 billion parameter language model developed by SII-Enigma, fine-tuned with the novel AMPO (Adaptive Multi-Guidance Policy Optimization) framework. AMPO is designed to improve model performance and efficiency by intelligently integrating guidance from diverse teacher models. This approach intervenes only when the primary model encounters a failure, preserving self-discovery while enhancing reasoning capabilities.

Key Capabilities & Innovations

Adaptive Multi-Guidance Replacement: This mechanism minimizes external intervention, providing guidance only when the on-policy model completely fails. This strategy aims to maintain the model's ability for self-discovery while boosting reasoning efficiency.
Comprehension-based Guidance Selection: To maximize learning effectiveness, the framework guides the model to assimilate the most comprehensible external solutions, which has been shown to improve overall performance.
Superior Performance: The AMPO framework is reported to achieve better performance and efficiency compared to models trained solely with Reinforcement Learning (RL) or Supervised Fine-Tuning (SFT).

Good For

Applications requiring enhanced reasoning capabilities through adaptive external guidance.
Scenarios where efficient learning from diverse knowledge sources is critical.
Tasks benefiting from a model that balances self-discovery with targeted external correction.

Overview

Model Overview

Key Capabilities & Innovations

Good For

Full Model Card (README)