Name: kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kendrickfff

Overview

The kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant-GRPO is a 1.5 billion parameter Qwen2.5 model, developed by kendrickfff. It has been fine-tuned from the kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant base model using a specialized Group Relative Policy Optimization (GRPO) method over 100 steps. This training approach incorporates four distinct reward functions focusing on format, reasoning length, correctness, and language.

Key Differentiator

A core feature of this model is its ability to learn and utilize <think>...</think> reasoning tags, which guide its internal thought process to produce more structured and accurate outputs. This method aims to enhance the model's reasoning capabilities, particularly for complex tasks.

Training Efficiency

The model's training was accelerated using Unsloth and Huggingface's TRL library, achieving a 2x faster training speed compared to conventional methods.

Use Cases

This model is particularly well-suited for applications requiring:

Indonesian language assistance with improved reasoning.
Tasks benefiting from structured thought processes.
Generating accurate and contextually relevant Indonesian text.

Overview

Overview

Key Differentiator

Training Efficiency

Use Cases

Full Model Card (README)