choco800/qwen3-4b-agent-v8
The choco800/qwen3-4b-agent-v8 is a 4 billion parameter model fine-tuned from Qwen/Qwen3-4B-Instruct-2507, designed for enhanced multi-turn agent task performance. This fully merged model, trained with Unsloth, excels in environments like ALFWorld and DBBench by learning environment observation, action selection, and error recovery. It is specifically optimized for agentic workflows, processing up to 32768 tokens.
Loading preview...
Model Overview
The choco800/qwen3-4b-agent-v8 is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. This repository provides a fully merged model, meaning it includes the base model weights and does not require separate loading of adapters. It was trained using LoRA with Unsloth, resulting in a 16-bit merged model.
Key Capabilities
- Enhanced Agentic Performance: Specifically trained to improve multi-turn agent task performance.
- Task Domains: Optimized for tasks in ALFWorld (household tasks) and DBBench (database operations).
- Learning Trajectory: The model learns from all assistant turns in a multi-turn trajectory, covering environment observation, action selection, tool use, and error recovery.
- Context Length: Supports a maximum sequence length of 8192 tokens during training.
Training Details
The model was trained for 1 epoch with a learning rate of 1e-05. Loss was applied exclusively to the assistant's responses, masking user prompts and observations. The training utilized specific datasets including u-10bei/dbbench_sft_dataset_react, u-10bei/dbbench_sft_dataset_react_v3, and u-10bei/dbbench_sft_dataset_react_v4, all distributed under the MIT License. Users must comply with both dataset and base model (Apache 2.0) licenses.