kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant-GRPO
The kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant-GRPO is a 1.5 billion parameter Qwen2.5 model developed by kendrickfff, fine-tuned using Group Relative Policy Optimization (GRPO). This model is specifically optimized for Indonesian language assistance, incorporating reasoning tags to improve response quality. It is designed for tasks requiring structured thought processes and accurate Indonesian language generation.
Loading preview...
Overview
The kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant-GRPO is a 1.5 billion parameter Qwen2.5 model, developed by kendrickfff. It has been fine-tuned from the kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant base model using a specialized Group Relative Policy Optimization (GRPO) method over 100 steps. This training approach incorporates four distinct reward functions focusing on format, reasoning length, correctness, and language.
Key Differentiator
A core feature of this model is its ability to learn and utilize <think>...</think> reasoning tags, which guide its internal thought process to produce more structured and accurate outputs. This method aims to enhance the model's reasoning capabilities, particularly for complex tasks.
Training Efficiency
The model's training was accelerated using Unsloth and Huggingface's TRL library, achieving a 2x faster training speed compared to conventional methods.
Use Cases
This model is particularly well-suited for applications requiring:
- Indonesian language assistance with improved reasoning.
- Tasks benefiting from structured thought processes.
- Generating accurate and contextually relevant Indonesian text.