beyoru/Qwen3-4B-I-1209
Qwen3-4B-I-1209 is a 4 billion parameter instruction-tuned causal language model developed by Beyoru, fine-tuned from Qwen3-4B-Instruct-2507. This model specializes in tool-use and function call generation, leveraging Group Relative Policy Optimization (GRPO) with a composite reward system. It is specifically optimized for accurately generating function names and arguments, making it suitable for applications requiring reliable programmatic interaction.
Loading preview...
Overview
Qwen3-4B-I-1209 is a 4 billion parameter instruction-tuned model developed by Beyoru, based on the Qwen3-4B-Instruct-2507 architecture. It is specifically optimized for tool-use and function call generation through a sophisticated training methodology called Group Relative Policy Optimization (GRPO).
Key Capabilities & Training
This model's specialization stems from its unique reward design during training, which includes:
- Rule-based reward: Ensures correctness of function names and arguments, with partial credit for argument subsets.
- Self-certainty reward: Promotes confident and well-calibrated predictions.
- Tool-call reward: Validates the structural integrity of generated tool calls.
This multi-faceted reward system enhances the model's ability to produce accurate and reliable function calls.
Performance
Evaluated on ACEBench, Qwen3-4B-I-1209 demonstrates improved performance in tool-use scenarios:
- Qwen3-4B-I-1209 (this model): 0.7233 Overall Accuracy
- Qwen3-4B-Instruct-2507 (base model): 0.6350 Overall Accuracy
Ideal Use Cases
- Automated API interaction: Generating precise function calls for external tools and APIs.
- Agentic workflows: Developing AI agents that can reliably use tools to accomplish tasks.
- Code generation for function stubs: Creating accurate function signatures and argument structures.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.