Overview
The tussiiiii/Qwen3-4B-AgentBench-Merged is a 4 billion parameter model derived from Qwen/Qwen3-4B-Instruct-2507. It has been fine-tuned using LoRA (merged into the base model weights) to enhance its capabilities in multi-turn agent tasks. The model focuses on improving performance in environments requiring sequential decision-making and interaction.
Key Capabilities
- Multi-turn Agent Task Performance: Specifically trained to excel in complex, multi-turn interactions.
- Environment Observation: Learns to interpret and understand environmental states.
- Action Selection & Tool Use: Optimized for choosing appropriate actions and utilizing tools effectively within agentic workflows.
- Error Recovery: Designed to recover from errors during multi-turn trajectories, improving robustness.
- Targeted Domains: Demonstrates improved performance on household tasks (ALFWorld) and database operations (DBBench).
Training Details
The model was fine-tuned for 2 epochs with a learning rate of 2e-06, using a maximum sequence length of 2048. The training objective applied loss to all assistant turns, emphasizing learning across the entire multi-turn trajectory. The training data, tussiiiii/agentbench_sft_mix_alfworld_dbbench_v1, combines datasets from ALFWorld and DBBench, focusing on agentic SFT trajectories.
Good For
- Developing AI agents that require robust multi-turn interaction.
- Applications involving complex task execution in simulated or real-world environments.
- Scenarios demanding effective tool use and error handling in automated systems.