swadeshb/Llama-3.2-3B-Instruct-AMPO-V1-6

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Dec 30, 2025Architecture:Transformer Warm

swadeshb/Llama-3.2-3B-Instruct-AMPO-V1-6 is a 3.2 billion parameter instruction-tuned causal language model, fine-tuned from Meta's Llama-3.2-3B-Instruct. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring robust logical and mathematical problem-solving, leveraging a 32768 token context length.

Loading preview...

Model Overview

swadeshb/Llama-3.2-3B-Instruct-AMPO-V1-6 is a 3.2 billion parameter instruction-tuned model, building upon the meta-llama/Llama-3.2-3B-Instruct base. It was fine-tuned using the TRL library and incorporates the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its reasoning abilities.

Key Capabilities

  • Enhanced Mathematical Reasoning: The primary differentiator of this model is its training with the GRPO method, specifically aimed at improving performance on mathematical and logical reasoning tasks.
  • Instruction Following: As an instruction-tuned model, it is designed to understand and execute user prompts effectively.
  • Large Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The model's training procedure leveraged the TRL framework (version 0.23.0) and PyTorch (version 2.8.0+cu126). The application of GRPO, a technique highlighted in the DeepSeekMath research, suggests a focus on robust problem-solving rather than general conversational fluency.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

  • Mathematical Problem Solving: Tasks involving arithmetic, algebra, calculus, or other quantitative reasoning.
  • Logical Deduction: Scenarios where the model needs to follow complex rules or infer conclusions from given premises.
  • Instruction-based Generation: General instruction-following tasks where a strong reasoning backbone is beneficial.