Qwen3-4B Agent Trajectory (v27) Overview
This model, choco800/qwen3-4b-agent-v27, is a fully merged 4 billion parameter model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth. Unlike adapter repositories, it provides merged weights, eliminating the need to load a separate base model.
Key Capabilities & Training Focus
The primary objective of this model's training was to significantly enhance multi-turn agent task performance, specifically within environments like ALFWorld (household tasks). The training methodology applied loss to all assistant turns in a multi-turn trajectory, enabling the model to learn and improve across several critical agentic functions:
- Environment observation: Interpreting and understanding the state of its surroundings.
- Action selection: Choosing appropriate actions based on observations and goals.
- Tool use: Effectively utilizing available tools to complete tasks.
- Error recovery: Adapting and correcting its trajectory when encountering errors.
Training involved a maximum sequence length of 8192 tokens over 1 epoch, with loss computed only on the assistant's responses, masking user prompts and observations.
Datasets and Licensing
The model was trained using several versions of the sft_alfworld_trajectory_dataset (v3, v4, v5) from u-10bei, all distributed under the MIT License. Users must adhere to both the dataset licenses and the base model's original Apache 2.0 terms of use.