swadeshb/Llama-3.2-3B-Instruct-AMPO-V1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Dec 18, 2025Architecture:Transformer Warm

swadeshb/Llama-3.2-3B-Instruct-AMPO-V1 is a 3.2 billion parameter instruction-tuned causal language model, fine-tuned by swadeshb from the Meta Llama-3.2-3B-Instruct base model. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring improved logical and mathematical problem-solving, making it suitable for applications where precise reasoning is crucial.

Loading preview...

Model Overview

swadeshb/Llama-3.2-3B-Instruct-AMPO-V1 is an instruction-tuned language model based on Meta's Llama-3.2-3B-Instruct architecture, featuring 3.2 billion parameters and a 32768 token context length. It was developed by swadeshb and fine-tuned using the TRL library.

Key Differentiator

The primary distinction of this model lies in its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This approach specifically aims to improve the model's capabilities in mathematical reasoning and complex problem-solving.

Training Details

  • Base Model: meta-llama/Llama-3.2-3B-Instruct
  • Fine-tuning Method: GRPO, as detailed in the DeepSeekMath paper.
  • Frameworks Used: TRL (version 0.23.0), Transformers (version 4.57.1), PyTorch (version 2.8.0+cu126), Datasets (version 3.3.2), Tokenizers (version 0.22.1).

Potential Use Cases

  • Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning.
  • Logical Deduction: Suitable for tasks that benefit from enhanced logical processing.
  • Instruction Following: Benefits from its instruction-tuned base, making it responsive to user prompts.