choco800/qwen3-4b-agent-v16

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

choco800/qwen3-4b-agent-v16 is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This model is specifically optimized for multi-turn agent task performance, particularly in environments like ALFWorld. It learns environment observation, action selection, tool use, and error recovery within complex trajectories. The model is designed to enhance an agent's ability to navigate and complete household tasks through improved reasoning over multi-turn interactions.

Loading preview...

Qwen3-4B Agent Trajectory (v16) Overview

This model, choco800/qwen3-4b-agent-v16, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using LoRA and Unsloth, with the merged weights provided directly, eliminating the need to load a separate base model.

Key Capabilities & Training Focus

The primary objective of this model's training was to significantly improve multi-turn agent task performance, specifically within the ALFWorld environment. The training methodology focused on:

  • Learning from full trajectories: Loss was applied to all assistant turns, enabling the model to learn from environment observations, action selections, tool usage, and error recovery mechanisms.
  • Response-only loss masking: During training, loss was computed exclusively on the assistant's responses, ensuring focused learning on generating appropriate actions and dialogue.

Training Details

The model was trained for 1 epoch with a maximum sequence length of 8192 tokens, utilizing a learning rate of 1e-05. The training data consisted of several versions of the sft_alfworld_trajectory_dataset from u-10bei, licensed under MIT. Users must also comply with the base model's Apache 2.0 license.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

  • Agentic behavior: Developing AI agents that can perform complex, multi-step tasks.
  • Interactive environments: Scenarios where an agent needs to observe, act, and adapt over multiple turns.
  • Tool use and error recovery: Systems that benefit from an agent's ability to utilize tools and recover from mistakes in a structured environment.