Name: y-ohtani/qwen3-4b-ra-sft-epoch3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: y-ohtani

Overview

This model, y-ohtani/qwen3-4b-ra-sft-epoch3, is a 4 billion parameter Qwen3-based model that has undergone full fine-tuning (not LoRA) using the Open-AgentRL framework. It is the third epoch checkpoint from a total of 10 training epochs.

Key Capabilities

Multi-turn Agentic Reasoning: Specifically trained to handle complex problems requiring multiple interaction turns.
Tool Use: Proficient in utilizing a code_interpreter tool to solve mathematical and coding challenges.
Agentic Loop Learning: Optimized to learn the full agentic process: Think → Code → Execute → Observe → Answer, by applying loss to all assistant turns.
Foundation for RL: Designed as a "cold-start" model for further reinforcement learning stages, such as GRPO.

Training Details

The model was fine-tuned from Qwen/Qwen3-4B-Instruct-2507 with a maximum sequence length of 32,768 tokens. It was trained on 2,000 multi-turn conversations from the y-ohtani/open_agentrl_like_sft dataset, which is derived from swordfaith/ReTool-SFT-multi-turn and focuses on mathematical reasoning with a code interpreter. All training data is Apache-2.0 licensed.

Intended Use & Limitations

Intended: Primarily for agentic reasoning tasks involving tool use, particularly in math and coding. It is an intermediate checkpoint for further RL training.
Not Intended: For production deployment without additional evaluation or for tasks outside of its specialized domain, as performance on non-math/non-coding tasks may be degraded compared to the base instruct model.

Overview

Overview

Key Capabilities

Training Details

Intended Use & Limitations

Full Model Card (README)