Overview
This model, choco800/qwen3-4b-agent-v14, is a 4 billion parameter Qwen3-Instruct variant, specifically fine-tuned to enhance multi-turn agent task performance. Unlike typical adapter repositories, this is a fully merged model, meaning it contains all necessary weights and does not require loading a separate base model. It was trained using LoRA and Unsloth, with a maximum sequence length of 8192 tokens.
Key Capabilities
- Multi-turn Agent Task Performance: Optimized for complex, sequential tasks requiring multiple interactions.
- Environment Observation: Capable of processing and understanding environmental cues.
- Action Selection & Tool Use: Designed to make appropriate decisions and utilize tools within an agentic workflow.
- Error Recovery: Trained to handle and recover from errors encountered during task execution.
- Efficient Deployment: Provided as a fully merged model, simplifying integration and usage.
Training Focus
The model's training objective was to improve performance on agent tasks, particularly within environments like ALFWorld (household tasks). Loss was applied to all assistant turns in the multi-turn trajectory, ensuring comprehensive learning across observation, action, tool use, and error handling. The training utilized several versions of the dbbench_sft_dataset_react dataset, licensed under MIT.