Name: SII-Enigma/Llama3.2-8B-Ins-AMPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SII-Enigma

SII-Enigma/Llama3.2-8B-Ins-AMPO: Adaptive Multi-Guidance Policy Optimization

This model, developed by SII-Enigma, is an 8 billion parameter instruction-tuned variant of the Llama 3.2 architecture, featuring a substantial 32768 token context length. Its core innovation lies in the Adaptive Multi-Guidance Policy Optimization (AMPO) framework, a novel approach that intelligently integrates guidance from multiple, diverse teacher models.

Key Capabilities & Innovations

Adaptive Multi-Guidance Replacement: AMPO minimizes intervention by providing external guidance only when the on-policy model fails completely, preserving the model's self-discovery capabilities while significantly enhancing reasoning efficiency.
Comprehension-based Guidance Selection: This mechanism improves learning effectiveness by guiding the model to assimilate the most comprehensible external solutions, leading to demonstrably boosted performance.
Superior Performance: The model achieves better overall performance and efficiency compared to models trained using only Reinforcement Learning (RL) or Supervised Fine-Tuning (SFT) methods.
Multi-Guidance Pool: It leverages a diverse set of teacher models, including AceReason-Nemotron-1.1-7B, DeepSeek-R1-Distill-Qwen-7B, OpenR1-Qwen-7B, and Qwen3-8B (thinking), to provide robust external knowledge.

Use Cases

This model is particularly well-suited for tasks requiring enhanced reasoning and problem-solving, where leveraging external knowledge efficiently can lead to more accurate and robust outputs. Its adaptive guidance system makes it effective in scenarios where traditional fine-tuning might fall short, offering a more dynamic learning approach.

Overview

SII-Enigma/Llama3.2-8B-Ins-AMPO: Adaptive Multi-Guidance Policy Optimization

Key Capabilities & Innovations

Use Cases

Full Model Card (README)