Overview
This model, choco800/qwen3-4b-agent-v4, is a 4 billion parameter language model based on Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using Unsloth, resulting in a fully merged model that does not require loading a separate base model. The primary objective of its training was to significantly enhance multi-turn agent task performance.
Key Capabilities
- Multi-turn Agent Trajectory Learning: The model is trained to improve performance across entire multi-turn agent trajectories, applying loss to all assistant turns.
- Environment Interaction: It learns to process environment observations and make appropriate action selections.
- Tool Use: The model is capable of integrating and utilizing tools within its operational framework.
- Error Recovery: A key focus of its training includes the ability to recover from errors encountered during task execution.
- Specialized Task Domains: Demonstrated proficiency in tasks related to ALFWorld (household tasks) and DBBench (database operations).
Training Details
The model was trained for 1 epoch with a maximum sequence length of 8192 tokens. Training involved LoRA with r=16 and alpha=32, and loss was computed exclusively on the assistant's responses, masking user prompts and observations. The training data utilized includes u-10bei/sft_alfworld_trajectory_dataset_v3 and u-10bei/dbbench_sft_dataset_react_v4, both distributed under the MIT License.