melon1891/agentbench-qwen3-4b-lr5e6-20260224v2
The melon1891/agentbench-qwen3-4b-lr5e6-20260224v2 is a 4 billion parameter language model fine-tuned from Qwen/Qwen3-4B-Instruct-2507. It is specifically optimized for multi-turn agent task performance, focusing on household tasks (ALFWorld) and database operations (DBBench). This model excels at learning environment observation, action selection, tool use, and error recovery within complex multi-turn trajectories, making it suitable for autonomous agent applications.
Loading preview...
Overview
melon1891/agentbench-qwen3-4b-lr5e6-20260224v2 is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. This model leverages LoRA (merged into the base) and Unsloth for efficient training, with a focus on enhancing its capabilities for complex, multi-turn agentic tasks.
Key Capabilities
- Multi-turn Agent Task Performance: Specifically trained to improve performance in scenarios requiring sequential decision-making and interaction.
- Environment Interaction: Designed to learn from environment observations and select appropriate actions.
- Tool Use: Optimized for effective integration and utilization of tools within agent workflows.
- Error Recovery: Capable of learning to recover from errors encountered during multi-turn trajectories.
- Specialized Training: Loss is applied to all assistant turns, enabling comprehensive learning across an entire multi-turn interaction.
Good for
- Autonomous Agents: Ideal for developing agents that need to perform household tasks (e.g., ALFWorld) or database operations (e.g., DBBench).
- Complex Task Automation: Suitable for applications requiring models to manage multi-step processes, interact with environments, and utilize tools.
- Research in Agentic AI: Provides a specialized base for further experimentation and development in agent-based language models.