beyoru/Qwen3-4B-I-1209

Warm
Public
4B
BF16
40960
Sep 24, 2025
License: apache-2.0
Hugging Face
Overview

Model Overview

The beyoru/Qwen3-4B-I-1209 is a 4 billion parameter instruction-tuned model, built upon the Qwen3-4B-Instruct-2507 base. It has been specifically fine-tuned by Beyoru using Reinforcement Learning (GRPO), incorporating multiple reward functions to enhance its capabilities.

Key Capabilities & Training

This model is primarily optimized for tool-use and function call generation. Its training regimen utilized a multi-signal reward system, including:

  • Rule-based Reward: Checks the correctness of function call names and arguments, providing partial credit for matching subsets.
  • Self-Certainty Reward: Encourages the model to make confident predictions.
  • Tool-Call Reward: Validates the structural correctness of generated tool calls.

The training employed an AdamW optimizer with a learning rate of 5e-6 and a cosine decay scheduler.

Performance

On the ACEBench evaluation, Qwen3-4B-I-1209 demonstrates strong performance in tool-use and function calling:

  • Overall Accuracy: 0.7233
  • This surpasses its base model, Qwen3-4B-Instruct-2507, which scored 0.635, and Salesforce/Llama-xLAM-2-8b-fc-r at 0.5792.

Use Cases

This model is particularly well-suited for applications requiring precise and reliable function call generation and tool interaction, making it a strong candidate for agentic workflows and automated task execution.