tungpl/gsm8k-llama3-grpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:May 3, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The tungpl/gsm8k-llama3-grpo is a 3.2 billion parameter Llama-3.2-3B-Instruct model, developed by tungpl and fine-tuned for specific tasks. This model was trained using Unsloth and Huggingface's TRL library, enabling faster fine-tuning. It is designed for applications requiring a compact yet capable language model, leveraging the Llama 3 architecture.

Loading preview...

Model Overview

The tungpl/gsm8k-llama3-grpo is a 3.2 billion parameter language model, fine-tuned from the unsloth/Llama-3.2-3B-Instruct base model. Developed by tungpl, this model leverages the Llama 3 architecture and was optimized for training speed using the Unsloth library in conjunction with Huggingface's TRL library.

Key Characteristics

  • Base Model: Fine-tuned from unsloth/Llama-3.2-3B-Instruct.
  • Training Efficiency: Utilizes Unsloth for 2x faster fine-tuning.
  • Parameter Count: 3.2 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a context length of 32768 tokens.
  • License: Released under the Apache-2.0 license.

Use Cases

This model is suitable for developers looking for a Llama 3-based instruction-tuned model that benefits from optimized training. Its compact size makes it a good candidate for applications where resource efficiency is important, while its Llama 3 foundation provides strong language understanding capabilities. It is particularly relevant for tasks that align with its fine-tuning objectives, though specific task performance would require further evaluation.