ahme0599/meta-llama_Llama-3.2-3B-Instruct-GRPO-vanilla_G_4

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Dec 15, 2025Architecture:Transformer Warm

The ahme0599/meta-llama_Llama-3.2-3B-Instruct-GRPO-vanilla_G_4 model is a 3.2 billion parameter instruction-tuned language model, fine-tuned from Meta Llama-3.2-3B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is particularly suited for tasks requiring improved logical and mathematical problem-solving.

Loading preview...

Model Overview

This model, ahme0599/meta-llama_Llama-3.2-3B-Instruct-GRPO-vanilla_G_4, is a specialized instruction-tuned variant of the Meta Llama-3.2-3B-Instruct base model, featuring 3.2 billion parameters and a 32768 token context length. It has been fine-tuned using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It leverages the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This indicates a specific optimization for tasks that involve complex mathematical reasoning and problem-solving.

Intended Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for applications requiring:

  • Mathematical problem-solving: Excelling in tasks that demand logical deduction and numerical computation.
  • Reasoning tasks: Improved performance on challenges that benefit from structured thought processes.
  • Instruction following: Maintaining strong instruction-following capabilities inherited from its Llama-3.2-3B-Instruct base.

Developers looking for a compact yet capable model with a focus on mathematical and logical reasoning will find this model a strong candidate.