motobrew/qwen3-adv-comp-v34
motobrew/qwen3-adv-comp-v34 is a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507, a 4 billion parameter Qwen3 model. This adapter is specifically optimized for improving multi-turn agent task performance, particularly in environments like ALFWorld (household tasks) and DBBench (database operations). It enhances the model's ability to learn environment observation, action selection, tool use, and error recovery in complex multi-turn trajectories.
Loading preview...
Overview
This repository provides a LoRA adapter fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model, utilizing LoRA + Unsloth for efficient training. It contains only the LoRA adapter weights, requiring the base model to be loaded separately.
Key Capabilities
- Enhanced Multi-Turn Agent Performance: Specifically trained to improve performance in complex, multi-turn agent tasks.
- Task Domains: Optimized for tasks within ALFWorld (household environments) and DBBench (database operations).
- Learning Trajectories: The training objective focuses on applying loss to all assistant turns, enabling the model to learn from environment observations, action selection, tool usage, and error recovery within a multi-turn sequence.
Training Details
- Base Model: Qwen/Qwen3-4B-Instruct-2507 (4 billion parameters).
- Method: LoRA (full precision base) with a maximum sequence length of 2048.
- Configuration: Trained for 2 epochs with a learning rate of 2e-6, using LoRA parameters r=64 and alpha=128.
- Training Data: Utilizes the u-10bei/sft_alfworld_trajectory_dataset_v5 dataset, which is licensed under the MIT License.
Usage Considerations
Users must comply with the MIT license of the training dataset and the original terms of use for the Qwen3 base model.