choco800/qwen3-4b-agent-v10
The choco800/qwen3-4b-agent-v10 is a 4 billion parameter model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507, specifically designed for multi-turn agent task performance. This fully merged model, optimized using Unsloth, excels in complex environments like ALFWorld and DBBench by learning environment observation, action selection, tool use, and error recovery. It is particularly suited for applications requiring robust agentic capabilities in household tasks and database operations.
Loading preview...
Overview
The choco800/qwen3-4b-agent-v10 is a 4 billion parameter language model, fully merged and fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model using Unsloth. Unlike adapter repositories, this model provides merged weights, eliminating the need to load a separate base model.
Key Capabilities
- Multi-turn Agent Task Performance: Specifically trained to improve performance in multi-turn agentic scenarios.
- Environment Interaction: Learns to process environment observations and select appropriate actions.
- Tool Use: Developed with a focus on effective tool integration and utilization.
- Error Recovery: Designed to handle and recover from errors within complex task trajectories.
- Targeted Domains: Optimized for tasks within ALFWorld (household tasks) and DBBench (database operations).
Training Details
The model was trained with a focus on applying loss to all assistant turns in multi-turn trajectories, ensuring comprehensive learning across observation, action, and error handling. Key training configurations include a maximum sequence length of 8192, 1 epoch, and a learning rate of 1e-05, utilizing LoRA with r=16 and alpha=32. Loss was computed exclusively on the assistant's responses, masking user prompts and observations.
Good For
- Developing AI agents that require robust multi-turn interaction.
- Applications involving complex task execution in simulated environments.
- Scenarios demanding precise tool use and error handling capabilities.