SELEE/qwen3-4b-agent-v3
SELEE/qwen3-4b-agent-v3 is a 4 billion parameter, fully fine-tuned causal language model based on Qwen/Qwen3-4B-Instruct-2507. Developed by SELEE, this model is specifically optimized for multi-turn agent task performance, excelling in complex environments like household tasks (ALFWorld) and database operations (DBBench). It learns environment observation, action selection, tool use, and error recovery by applying loss to all assistant turns in a multi-turn trajectory.
Loading preview...
Overview
SELEE/qwen3-4b-agent-v3 is a 4 billion parameter model, fully fine-tuned from Qwen/Qwen3-4B-Instruct-2507. Unlike LoRA or other partial fine-tuning methods, this model contains full parameter weights, meaning it can be loaded directly without merging with a base model. It was trained for 2 epochs with a learning rate of 2e-06 and a maximum sequence length of 4096 tokens.
Key Capabilities
- Multi-turn Agent Performance: Specifically trained to improve performance in multi-turn agent tasks.
- Complex Task Handling: Excels in environments requiring sequential decision-making, such as household tasks (ALFWorld) and database operations (DBBench).
- Comprehensive Learning: Learns to observe environments, select appropriate actions, utilize tools effectively, and recover from errors within multi-turn interactions.
- Full Fine-tuning: Benefits from full parameter fine-tuning, ensuring robust integration of learned behaviors.
Training Objective
The primary objective was to enhance the model's ability to act as an agent in interactive environments. Loss was applied to every assistant turn in multi-turn trajectories, allowing the model to learn from the entire interaction flow, including error states and recovery mechanisms. The training data includes datasets like u-10bei/sft_alfworld_trajectory_dataset_v4 and u-10bei/dbbench_sft_dataset_react_v4, which are licensed under the MIT License.