tussiiiii/Qwen3-4B-AgentBench-Merged
The tussiiiii/Qwen3-4B-AgentBench-Merged model is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507, optimized for multi-turn agent task performance. It excels at household tasks (ALFWorld) and database operations (DBBench) by learning environment observation, action selection, and tool use. This model is specifically designed to improve agentic capabilities and error recovery in complex, multi-turn interactions, leveraging a 32K context length.
Loading preview...
Overview
The tussiiiii/Qwen3-4B-AgentBench-Merged is a 4 billion parameter model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using LoRA (merged into the base model weights) to enhance its capabilities in multi-turn agent tasks. The model focuses on improving performance in environments requiring sequential decision-making and interaction.
Key Capabilities
- Multi-turn Agent Task Performance: Specifically trained to excel in complex, multi-turn interactions.
- Environment Observation: Learns to interpret and understand environmental states.
- Action Selection & Tool Use: Optimized for choosing appropriate actions and utilizing tools effectively within agentic workflows.
- Error Recovery: Designed to recover from errors during multi-turn trajectories, improving robustness.
- Targeted Domains: Demonstrates improved performance on household tasks (ALFWorld) and database operations (DBBench).
Training Details
The model was fine-tuned for 2 epochs with a learning rate of 2e-06, using a maximum sequence length of 2048. The training objective applied loss to all assistant turns, emphasizing learning across the entire multi-turn trajectory. The training data, tussiiiii/agentbench_sft_mix_alfworld_dbbench_v1, combines datasets from ALFWorld and DBBench, focusing on agentic SFT trajectories.
Good For
- Developing AI agents that require robust multi-turn interaction.
- Applications involving complex task execution in simulated or real-world environments.
- Scenarios demanding effective tool use and error handling in automated systems.