choco800/qwen3-4b-agent-v4

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 28, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The choco800/qwen3-4b-agent-v4 is a 4 billion parameter Qwen3-based instruction-tuned language model, fine-tuned by choco800. This fully merged model is specifically optimized for multi-turn agent task performance, excelling in environments like ALFWorld and DBBench. It learns environment observation, action selection, tool use, and error recovery, making it suitable for complex interactive agent applications.

Loading preview...

Overview

This model, choco800/qwen3-4b-agent-v4, is a 4 billion parameter language model based on Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Unsloth, resulting in a fully merged model that does not require loading a separate base model. The primary objective of its training was to significantly enhance multi-turn agent task performance.

Key Capabilities

  • Multi-turn Agent Trajectory Learning: The model is trained to improve performance across entire multi-turn agent trajectories, applying loss to all assistant turns.
  • Environment Interaction: It learns to process environment observations and make appropriate action selections.
  • Tool Use: The model is capable of integrating and utilizing tools within its operational framework.
  • Error Recovery: A key focus of its training includes the ability to recover from errors encountered during task execution.
  • Specialized Task Domains: Demonstrated proficiency in tasks related to ALFWorld (household tasks) and DBBench (database operations).

Training Details

The model was trained for 1 epoch with a maximum sequence length of 8192 tokens. Training involved LoRA with r=16 and alpha=32, and loss was computed exclusively on the assistant's responses, masking user prompts and observations. The training data utilized includes u-10bei/sft_alfworld_trajectory_dataset_v3 and u-10bei/dbbench_sft_dataset_react_v4, both distributed under the MIT License.