kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant-GRPO

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 23, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant-GRPO is a 1.5 billion parameter Qwen2.5 model developed by kendrickfff, fine-tuned using Group Relative Policy Optimization (GRPO). This model is specifically optimized for Indonesian language assistance, incorporating reasoning tags to improve response quality. It is designed for tasks requiring structured thought processes and accurate Indonesian language generation.

Loading preview...

Overview

The kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant-GRPO is a 1.5 billion parameter Qwen2.5 model, developed by kendrickfff. It has been fine-tuned from the kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant base model using a specialized Group Relative Policy Optimization (GRPO) method over 100 steps. This training approach incorporates four distinct reward functions focusing on format, reasoning length, correctness, and language.

Key Differentiator

A core feature of this model is its ability to learn and utilize <think>...</think> reasoning tags, which guide its internal thought process to produce more structured and accurate outputs. This method aims to enhance the model's reasoning capabilities, particularly for complex tasks.

Training Efficiency

The model's training was accelerated using Unsloth and Huggingface's TRL library, achieving a 2x faster training speed compared to conventional methods.

Use Cases

This model is particularly well-suited for applications requiring:

  • Indonesian language assistance with improved reasoning.
  • Tasks benefiting from structured thought processes.
  • Generating accurate and contextually relevant Indonesian text.