y-ohtani/qwen3-4b-agent-sft-true

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 2, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The y-ohtani/qwen3-4b-agent-sft-true model is a 4 billion parameter, full fine-tuned variant of Qwen3-4B-Instruct-2507, developed by y-ohtani. It is specifically trained for multi-turn agentic interactions using the Open-AgentRL framework and a dataset of 3,000 real end-to-end agentic trajectories. With a 32,768 token context length, this model excels at complex, multi-step conversational tasks requiring agent-like reasoning.

Loading preview...

Overview

This model, y-ohtani/qwen3-4b-agent-sft-true, is a full fine-tuned version of the Qwen3-4B-Instruct-2507 base model, developed by y-ohtani. Unlike LoRA adapters, this model has undergone comprehensive training to adapt its entire parameter set. It leverages the Open-AgentRL framework for its fine-tuning process, specifically designed for agentic capabilities.

Key Capabilities

  • Agentic SFT: Trained with multi-turn agentic Supervised Fine-Tuning (SFT) using real End-to-End agentic trajectories.
  • Extended Context: Supports a maximum sequence length of 32,768 tokens, enabling processing of lengthy and complex conversations.
  • Specialized Dataset: Fine-tuned on the Gen-Verse/Open-AgentRL-SFT-3K dataset, comprising 3,000 multi-turn conversations tailored for agent behavior.
  • Full Fine-tuning: Utilizes FSDP with bfloat16 for robust and efficient full model fine-tuning.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

  • Complex Multi-turn Dialogues: Excels in scenarios where an AI agent needs to maintain context and perform multi-step reasoning over several turns.
  • Agent-based Systems: Designed for integration into systems that require an AI to act as an agent, performing tasks or solving problems through interactive conversation.
  • Instruction Following: Enhanced ability to follow intricate instructions and generate coherent, contextually relevant responses in agentic workflows.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p