Name: swadeshb/Llama-3.2-3B-Instruct-AMPO-V1-6 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: swadeshb

Model Overview

swadeshb/Llama-3.2-3B-Instruct-AMPO-V1-6 is a 3.2 billion parameter instruction-tuned model, building upon the meta-llama/Llama-3.2-3B-Instruct base. It was fine-tuned using the TRL library and incorporates the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its reasoning abilities.

Key Capabilities

Enhanced Mathematical Reasoning: The primary differentiator of this model is its training with the GRPO method, specifically aimed at improving performance on mathematical and logical reasoning tasks.
Instruction Following: As an instruction-tuned model, it is designed to understand and execute user prompts effectively.
Large Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The model's training procedure leveraged the TRL framework (version 0.23.0) and PyTorch (version 2.8.0+cu126). The application of GRPO, a technique highlighted in the DeepSeekMath research, suggests a focus on robust problem-solving rather than general conversational fluency.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Mathematical Problem Solving: Tasks involving arithmetic, algebra, calculus, or other quantitative reasoning.
Logical Deduction: Scenarios where the model needs to follow complex rules or infer conclusions from given premises.
Instruction-based Generation: General instruction-following tasks where a strong reasoning backbone is beneficial.

Overview

Model Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)