beyoru/Qwen3-4B-I-1209

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Sep 24, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

Qwen3-4B-I-1209 is a 4 billion parameter instruction-tuned causal language model developed by Beyoru, fine-tuned from Qwen3-4B-Instruct-2507. This model specializes in tool-use and function call generation, leveraging Group Relative Policy Optimization (GRPO) with a composite reward system. It is specifically optimized for accurately generating function names and arguments, making it suitable for applications requiring reliable programmatic interaction.

Loading preview...

Overview

Qwen3-4B-I-1209 is a 4 billion parameter instruction-tuned model developed by Beyoru, based on the Qwen3-4B-Instruct-2507 architecture. It is specifically optimized for tool-use and function call generation through a sophisticated training methodology called Group Relative Policy Optimization (GRPO).

Key Capabilities & Training

This model's specialization stems from its unique reward design during training, which includes:

  • Rule-based reward: Ensures correctness of function names and arguments, with partial credit for argument subsets.
  • Self-certainty reward: Promotes confident and well-calibrated predictions.
  • Tool-call reward: Validates the structural integrity of generated tool calls.

This multi-faceted reward system enhances the model's ability to produce accurate and reliable function calls.

Performance

Evaluated on ACEBench, Qwen3-4B-I-1209 demonstrates improved performance in tool-use scenarios:

  • Qwen3-4B-I-1209 (this model): 0.7233 Overall Accuracy
  • Qwen3-4B-Instruct-2507 (base model): 0.6350 Overall Accuracy

Ideal Use Cases

  • Automated API interaction: Generating precise function calls for external tools and APIs.
  • Agentic workflows: Developing AI agents that can reliably use tools to accomplish tasks.
  • Code generation for function stubs: Creating accurate function signatures and argument structures.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p