Name: SII-Enigma/Qwen2.5-7B-Ins-AMPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SII-Enigma

Overview

SII-Enigma/Qwen2.5-7B-Ins-AMPO is a 7.6 billion parameter instruction-tuned model built upon the Qwen2.5 architecture, developed by SII-Enigma. It introduces the Adaptive Multi-Guidance Policy Optimization (AMPO) framework, a novel approach that enhances model performance and efficiency by strategically integrating knowledge from multiple teacher models. Unlike traditional methods, AMPO intervenes with external guidance only when the on-policy model encounters difficulties, preserving the model's self-discovery capabilities while boosting reasoning.

Key Capabilities

Adaptive Multi-Guidance Replacement: Minimizes external intervention, providing guidance only upon complete on-policy failure to maintain self-discovery and improve reasoning efficiency.
Comprehension-based Guidance Selection: Optimizes learning by guiding the model to assimilate the most comprehensible external solutions, leading to demonstrably improved performance.
Superior Performance: Achieves enhanced performance and efficiency compared to models trained solely with Reinforcement Learning (RL) or Supervised Fine-Tuning (SFT).
Multi-Guidance Pool: Leverages a diverse set of teacher models, including AceReason-Nemotron-1.1-7B, DeepSeek-R1-Distill-Qwen-7B, OpenR1-Qwen-7B, and Qwen3-8B(thinking), to provide robust external knowledge.

Good For

Complex Reasoning Tasks: Excels in scenarios requiring intricate problem-solving and logical deduction, benefiting from its adaptive guidance mechanism.
Efficiency-focused Applications: Offers improved efficiency by selectively applying external knowledge, reducing unnecessary computational overhead.
Research and Development: Provides a strong foundation for further exploration into multi-teacher learning and adaptive policy optimization techniques, as detailed in its associated paper.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)