choco800/qwen3-4b-agent-v13

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The choco800/qwen3-4b-agent-v13 is a 4 billion parameter Qwen3-based instruction-tuned causal language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This model is specifically optimized for multi-turn agent task performance, excelling in environment observation, action selection, tool use, and error recovery within complex scenarios like ALFWorld household tasks. It features a 32K context length and is provided as a fully merged model, eliminating the need to load a separate base model.

Loading preview...

choco800/qwen3-4b-agent-v13: Agent Trajectory Model

This model is a fully merged 4 billion parameter Qwen3-based instruction-tuned model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth. Unlike standard adapter repositories, this release includes the merged weights, simplifying deployment as it does not require loading a separate base model.

Key Capabilities & Training Focus

The primary objective of this model's training was to significantly improve multi-turn agent task performance. It is specifically optimized for scenarios requiring:

  • Environment observation
  • Action selection
  • Effective tool use
  • Recovery from errors within multi-turn trajectories.

Training focused on tasks within the ALFWorld (household tasks) environment, applying loss to all assistant turns to reinforce learning across the entire interaction sequence. The model was trained for 1 epoch with a maximum sequence length of 8192 tokens, utilizing LoRA (r=16, alpha=32) and a learning rate of 5e-06.

Data & Licensing

The model was trained on a combination of datasets including u-10bei/dbbench_sft_dataset_react (v1-v4), which are available on Hugging Face Hub under the MIT License. Users must adhere to both the dataset licenses and the base model's original Apache 2.0 terms of use.