Overview
The choco800/qwen3-4b-agent-v10 is a 4 billion parameter language model, fully merged and fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model using Unsloth. Unlike adapter repositories, this model provides merged weights, eliminating the need to load a separate base model.
Key Capabilities
- Multi-turn Agent Task Performance: Specifically trained to improve performance in multi-turn agentic scenarios.
- Environment Interaction: Learns to process environment observations and select appropriate actions.
- Tool Use: Developed with a focus on effective tool integration and utilization.
- Error Recovery: Designed to handle and recover from errors within complex task trajectories.
- Targeted Domains: Optimized for tasks within ALFWorld (household tasks) and DBBench (database operations).
Training Details
The model was trained with a focus on applying loss to all assistant turns in multi-turn trajectories, ensuring comprehensive learning across observation, action, and error handling. Key training configurations include a maximum sequence length of 8192, 1 epoch, and a learning rate of 1e-05, utilizing LoRA with r=16 and alpha=32. Loss was computed exclusively on the assistant's responses, masking user prompts and observations.
Good For
- Developing AI agents that require robust multi-turn interaction.
- Applications involving complex task execution in simulated environments.
- Scenarios demanding precise tool use and error handling capabilities.