Gen-Verse/Qwen3-4B-RA-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Sep 6, 2025Architecture:Transformer0.0K Warm

Gen-Verse/Qwen3-4B-RA-SFT is a 4 billion parameter agentic reasoning model, fine-tuned from Qwen3-4B-Instruct-2507, developed by Gen-Verse. This model is specifically optimized for complex reasoning tasks by leveraging high-quality agentic SFT data and reinforcement learning techniques. It demonstrates enhanced performance in agentic reasoning, even outperforming larger models on challenging benchmarks. The model is designed for applications requiring sophisticated decision-making and problem-solving capabilities.

Loading preview...

Gen-Verse/Qwen3-4B-RA-SFT: Agentic Reasoning Model

Gen-Verse/Qwen3-4B-RA-SFT is a 4 billion parameter model, fine-tuned from Qwen3-4B-Instruct-2507, specifically designed for agentic reasoning tasks. Developed by Gen-Verse, this model incorporates insights from systematic investigations into agentic Reinforcement Learning (RL), focusing on data quality, training efficiency, and reasoning strategies.

Key Capabilities & Differentiators

  • Optimized for Agentic Reasoning: Fine-tuned using a 3K agentic SFT dataset and further enhanced with a 30K agentic RL dataset, emphasizing real end-to-end trajectories and high data diversity.
  • Performance: Demonstrates the ability for a 4B model to outperform 32B models on challenging reasoning benchmarks, attributed to effective RL recipes.
  • Training Efficiency: Utilizes exploration-friendly techniques like reward clipping and entropy maintenance to boost training efficiency.
  • Reasoning Strategy: Employs deliberative reasoning with selective tool calls, proving more effective than frequent invocation or verbose self-reasoning.

Ideal Use Cases

  • Complex Problem Solving: Suitable for applications requiring sophisticated multi-step reasoning and decision-making.
  • Agent-based Systems: Excellent for integrating into AI agents that need to plan, execute, and adapt based on environmental feedback.
  • Research in Agentic AI: Provides a strong baseline and tool for further research into reinforcement learning for agentic capabilities.