yoei/qwen3-4b-agentbench-merged-B
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 21, 2026License:apache-2.0Architecture:Transformer Open Weights Cold
The yoei/qwen3-4b-agentbench-merged-B is a 4 billion parameter LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507, developed by yoei. This adapter is specifically optimized for multi-turn agent task performance, excelling in environments like ALFWorld (household tasks) and DBBench (database operations). It learns environment observation, action selection, tool use, and error recovery within complex agent trajectories, making it suitable for autonomous agent applications.
Loading preview...
Overview
This repository provides a LoRA adapter fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model, utilizing LoRA + Unsloth for efficient training. It contains only the adapter weights, requiring the base model to be loaded separately.
Key Capabilities
- Enhanced Multi-Turn Agent Performance: Specifically trained to improve the model's ability to handle complex, multi-turn agent tasks.
- Task Domains: Optimized for performance in ALFWorld (household tasks) and DBBench (database operations).
- Trajectory Learning: The training objective applies loss to all assistant turns in a multi-turn trajectory, enabling the model to learn:
- Environment observation
- Action selection
- Tool use
- Error recovery
Training Details
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (full precision base)
- Max Sequence Length: 2048 tokens
- Epochs: 15
- Learning Rate: 2e-06
- LoRA Configuration: r=64, alpha=128
- Training Data: Utilizes datasets such as
u-10bei/dbbench_sft_dataset_react_v3andu-10bei/sft_alfworld_trajectory_dataset_v5, both distributed under the MIT License.
Good For
- Developers building autonomous agents that require robust multi-turn interaction and task completion.
- Applications involving complex environments where agents need to observe, act, use tools, and recover from mistakes.
- Research into agentic LLM capabilities and trajectory-based learning.