yahid/triage-agent-qwen3b

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 25, 2026Architecture:Transformer Cold

The yahid/triage-agent-qwen3b model is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. Developed by yahid, this model utilizes the GRPO training method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring robust reasoning, leveraging its 32768-token context length.

Loading preview...

Overview

The yahid/triage-agent-qwen3b is a 3.1 billion parameter instruction-tuned language model, building upon the Qwen/Qwen2.5-3B-Instruct architecture. It has been specifically fine-tuned using the TRL library and incorporates the GRPO (Generative Reinforcement Learning with Policy Optimization) training method.

Key Capabilities

  • Enhanced Reasoning: The model's training with GRPO, a method introduced in the DeepSeekMath paper, suggests a focus on improving mathematical and general reasoning abilities.
  • Instruction Following: As an instruction-tuned model, it is designed to understand and execute user prompts effectively.
  • Large Context Window: Benefits from a 32768-token context length, allowing it to process and generate longer, more complex sequences of text.

Training Details

The model was trained using GRPO, a technique highlighted for its effectiveness in mathematical reasoning tasks. The training procedure leveraged specific versions of key frameworks:

  • TRL: 1.2.0
  • Transformers: 4.57.6
  • Pytorch: 2.10.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Use Cases

This model is suitable for applications requiring strong reasoning capabilities and accurate instruction following, particularly in scenarios where the GRPO training method's benefits in mathematical or logical tasks could be advantageous.