tungpl/gsm8k-llama3-grpo
The tungpl/gsm8k-llama3-grpo is a 3.2 billion parameter Llama-3.2-3B-Instruct model, developed by tungpl and fine-tuned for specific tasks. This model was trained using Unsloth and Huggingface's TRL library, enabling faster fine-tuning. It is designed for applications requiring a compact yet capable language model, leveraging the Llama 3 architecture.
Loading preview...
Model Overview
The tungpl/gsm8k-llama3-grpo is a 3.2 billion parameter language model, fine-tuned from the unsloth/Llama-3.2-3B-Instruct base model. Developed by tungpl, this model leverages the Llama 3 architecture and was optimized for training speed using the Unsloth library in conjunction with Huggingface's TRL library.
Key Characteristics
- Base Model: Fine-tuned from
unsloth/Llama-3.2-3B-Instruct. - Training Efficiency: Utilizes Unsloth for 2x faster fine-tuning.
- Parameter Count: 3.2 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a context length of 32768 tokens.
- License: Released under the Apache-2.0 license.
Use Cases
This model is suitable for developers looking for a Llama 3-based instruction-tuned model that benefits from optimized training. Its compact size makes it a good candidate for applications where resource efficiency is important, while its Llama 3 foundation provides strong language understanding capabilities. It is particularly relevant for tasks that align with its fine-tuning objectives, though specific task performance would require further evaluation.