choco800/qwen3-4b-agent-v27

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 2, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The choco800/qwen3-4b-agent-v27 is a 4 billion parameter Qwen3-based instruction-tuned model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507. It is specifically optimized for multi-turn agent task performance, particularly in household tasks like those found in ALFWorld. This model excels at environment observation, action selection, tool use, and error recovery within complex multi-turn trajectories, offering a fully merged solution for agentic applications.

Loading preview...

Qwen3-4B Agent Trajectory (v27) Overview

This model, choco800/qwen3-4b-agent-v27, is a fully merged 4 billion parameter model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth. Unlike adapter repositories, it provides merged weights, eliminating the need to load a separate base model.

Key Capabilities & Training Focus

The primary objective of this model's training was to significantly enhance multi-turn agent task performance, specifically within environments like ALFWorld (household tasks). The training methodology applied loss to all assistant turns in a multi-turn trajectory, enabling the model to learn and improve across several critical agentic functions:

  • Environment observation: Interpreting and understanding the state of its surroundings.
  • Action selection: Choosing appropriate actions based on observations and goals.
  • Tool use: Effectively utilizing available tools to complete tasks.
  • Error recovery: Adapting and correcting its trajectory when encountering errors.

Training involved a maximum sequence length of 8192 tokens over 1 epoch, with loss computed only on the assistant's responses, masking user prompts and observations.

Datasets and Licensing

The model was trained using several versions of the sft_alfworld_trajectory_dataset (v3, v4, v5) from u-10bei, all distributed under the MIT License. Users must adhere to both the dataset licenses and the base model's original Apache 2.0 terms of use.