melon1891/agentbench-qwen3-4b-2stage-reasoning-20260228

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 28, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The melon1891/agentbench-qwen3-4b-2stage-reasoning-20260228 is a 4 billion parameter language model fine-tuned from melon1891/agentbench-qwen3-4b-lr5e6-20260224v2, specifically optimized for multi-turn agent task performance. It excels in complex environments like ALFWorld and DBBench by learning environment observation, action selection, tool use, and error recovery. This model is designed for applications requiring robust reasoning and sequential decision-making capabilities within agentic workflows.

Loading preview...

Overview

This model, melon1891/agentbench-qwen3-4b-2stage-reasoning-20260228, is a 4 billion parameter language model fine-tuned from melon1891/agentbench-qwen3-4b-lr5e6-20260224v2. It leverages LoRA (merged into the base model) to enhance its capabilities, with a maximum sequence length of 8192 tokens.

Key Capabilities

  • Multi-turn Agent Task Performance: Specifically trained to improve performance in complex, multi-turn agentic tasks.
  • Environment Interaction: Learns to process environment observations and select appropriate actions.
  • Tool Use: Developed to effectively utilize tools within agent workflows.
  • Error Recovery: Designed to recover from errors during task execution, contributing to more robust agent behavior.
  • Targeted Domains: Optimized for tasks in ALFWorld (household tasks) and DBBench (database operations).

Training Details

The model was trained for 3 epochs with a learning rate of 1e-06, using the melon1891/reasoning-chain-distilled-317 dataset. Loss was applied to all assistant turns in the multi-turn trajectory to reinforce learning across the entire task sequence.

Good For

  • Developing AI agents that require sequential reasoning and decision-making.
  • Applications involving complex, multi-step interactions with environments.
  • Tasks in household automation (ALFWorld-like scenarios) or database management (DBBench-like scenarios).