choco800/qwen3-4b-agent-v24

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 2, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

choco800/qwen3-4b-agent-v24 is a 4 billion parameter Qwen3-based instruction-tuned language model, fine-tuned by choco800 using Unsloth. This fully merged model is specifically optimized for multi-turn agent task performance, excelling in environments like ALFWorld by learning environment observation, action selection, tool use, and error recovery. It is designed for applications requiring robust agentic capabilities in complex, interactive tasks.

Loading preview...

Model Overview

choco800/qwen3-4b-agent-v24 is a 4 billion parameter language model, fully merged and fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth. Unlike adapter repositories, this model contains the complete merged weights, eliminating the need to load a separate base model.

Key Capabilities

This model is specifically trained to enhance multi-turn agent task performance, particularly within environments like ALFWorld (household tasks). Its training objective focuses on enabling the model to:

  • Learn environment observation: Understand and interpret the state of an interactive environment.
  • Perform action selection: Choose appropriate actions based on observations and task goals.
  • Utilize tools: Integrate and effectively use external tools within a task trajectory.
  • Recover from errors: Adapt and correct its behavior in response to unexpected outcomes or failures.

Loss was applied to all assistant turns in the multi-turn trajectory, ensuring comprehensive learning across the entire interaction sequence.

Training Details

The model was trained for 1 epoch with a maximum sequence length of 8192, using a learning rate of 7e-06. It leveraged LoRA (r=8, alpha=16) and incorporated techniques like NEFTUNE_NOISE_ALPHA=5.0 to improve training stability and performance. The training data primarily consisted of ALFWorld trajectory datasets (v3, v4, v5) from u-10bei, with loss masking applied only to the assistant's responses.

Good For

  • Developing AI agents for interactive, multi-step tasks.
  • Applications requiring robust tool use and error recovery in simulated or real-world environments.
  • Research into agentic LLMs and their performance in complex task execution.