Name: Gen-Verse/Qwen3-4B-RA-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Gen-Verse

Gen-Verse/Qwen3-4B-RA-SFT: Agentic Reasoning Model

Gen-Verse/Qwen3-4B-RA-SFT is a 4 billion parameter model, fine-tuned from Qwen3-4B-Instruct-2507, specifically designed for agentic reasoning tasks. Developed by Gen-Verse, this model incorporates insights from systematic investigations into agentic Reinforcement Learning (RL), focusing on data quality, training efficiency, and reasoning strategies.

Key Capabilities & Differentiators

Optimized for Agentic Reasoning: Fine-tuned using a 3K agentic SFT dataset and further enhanced with a 30K agentic RL dataset, emphasizing real end-to-end trajectories and high data diversity.
Performance: Demonstrates the ability for a 4B model to outperform 32B models on challenging reasoning benchmarks, attributed to effective RL recipes.
Training Efficiency: Utilizes exploration-friendly techniques like reward clipping and entropy maintenance to boost training efficiency.
Reasoning Strategy: Employs deliberative reasoning with selective tool calls, proving more effective than frequent invocation or verbose self-reasoning.

Ideal Use Cases

Complex Problem Solving: Suitable for applications requiring sophisticated multi-step reasoning and decision-making.
Agent-based Systems: Excellent for integrating into AI agents that need to plan, execute, and adapt based on environmental feedback.
Research in Agentic AI: Provides a strong baseline and tool for further research into reinforcement learning for agentic capabilities.

Overview

Gen-Verse/Qwen3-4B-RA-SFT: Agentic Reasoning Model

Key Capabilities & Differentiators

Ideal Use Cases

Full Model Card (README)