Qwen3-4B Agent Trajectory (v16) Overview
This model, choco800/qwen3-4b-agent-v16, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using LoRA and Unsloth, with the merged weights provided directly, eliminating the need to load a separate base model.
Key Capabilities & Training Focus
The primary objective of this model's training was to significantly improve multi-turn agent task performance, specifically within the ALFWorld environment. The training methodology focused on:
- Learning from full trajectories: Loss was applied to all assistant turns, enabling the model to learn from environment observations, action selections, tool usage, and error recovery mechanisms.
- Response-only loss masking: During training, loss was computed exclusively on the assistant's responses, ensuring focused learning on generating appropriate actions and dialogue.
Training Details
The model was trained for 1 epoch with a maximum sequence length of 8192 tokens, utilizing a learning rate of 1e-05. The training data consisted of several versions of the sft_alfworld_trajectory_dataset from u-10bei, licensed under MIT. Users must also comply with the base model's Apache 2.0 license.
Ideal Use Cases
This model is particularly well-suited for applications requiring:
- Agentic behavior: Developing AI agents that can perform complex, multi-step tasks.
- Interactive environments: Scenarios where an agent needs to observe, act, and adapt over multiple turns.
- Tool use and error recovery: Systems that benefit from an agent's ability to utilize tools and recover from mistakes in a structured environment.