choco800/qwen3-4b-agent-v11

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The choco800/qwen3-4b-agent-v11 is a 4 billion parameter Qwen3-based model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This fully merged model is specifically optimized for multi-turn agent task performance, excelling in environments like ALFWorld and DBBench. It learns environment observation, action selection, tool use, and error recovery, making it suitable for complex interactive agent applications.

Loading preview...

Model Overview

This model, choco800/qwen3-4b-agent-v11, is a 4 billion parameter language model fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It is provided as a fully merged model, eliminating the need to load a separate base model.

Key Capabilities

  • Multi-turn Agent Task Performance: Specifically trained to enhance performance in multi-turn agent scenarios.
  • Environment Interaction: Optimized for tasks requiring environment observation and action selection.
  • Tool Use: Designed to effectively utilize tools within agent trajectories.
  • Error Recovery: Capable of learning to recover from errors during complex tasks.
  • Targeted Domains: Demonstrates improved performance on benchmarks like ALFWorld (household tasks) and DBBench (database operations).

Training Details

The model was fine-tuned using LoRA with Unsloth, with a maximum sequence length of 8192 tokens. Loss was applied exclusively to the assistant's responses, focusing the training on generating effective agent actions and observations. The training data includes u-10bei/dbbench_sft_dataset_react, which is distributed under the MIT License.