Model Overview
The choco800/qwen3-4b-agent-v8 is a 4 billion parameter language model, fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. This repository provides a fully merged model, meaning it includes the base model weights and does not require separate loading of adapters. It was trained using LoRA with Unsloth, resulting in a 16-bit merged model.
Key Capabilities
- Enhanced Agentic Performance: Specifically trained to improve multi-turn agent task performance.
- Task Domains: Optimized for tasks in ALFWorld (household tasks) and DBBench (database operations).
- Learning Trajectory: The model learns from all assistant turns in a multi-turn trajectory, covering environment observation, action selection, tool use, and error recovery.
- Context Length: Supports a maximum sequence length of 8192 tokens during training.
Training Details
The model was trained for 1 epoch with a learning rate of 1e-05. Loss was applied exclusively to the assistant's responses, masking user prompts and observations. The training utilized specific datasets including u-10bei/dbbench_sft_dataset_react, u-10bei/dbbench_sft_dataset_react_v3, and u-10bei/dbbench_sft_dataset_react_v4, all distributed under the MIT License. Users must comply with both dataset and base model (Apache 2.0) licenses.