Overview
This repository provides a LoRA adapter (r=64) fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It is designed to enhance the base model's capabilities in complex, multi-turn agent tasks.
Key Capabilities
- Improved Agent Task Performance: Specifically trained on ALFWorld (household tasks) and DBBench (database operations) datasets.
- Multi-Turn Trajectory Learning: The training objective applies loss to all assistant turns, enabling the model to learn from environment observations, action selection, tool use, and error recovery within a sequence of interactions.
- LoRA Fine-tuning: Utilizes LoRA with a full precision base model, configured with r=64 and alpha=128, over 3 epochs.
Training Details
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (full precision base) with Unsloth
- Max Sequence Length: 4096 tokens
- Learning Rate: 2e-04
- Training Data: Combines
u-10bei/sft_alfworld_trajectory_dataset_v5 and u-10bei/dbbench_sft_dataset_react_v4, both licensed under MIT.
Usage Notes
This repository contains LoRA adapter weights only. Users must load the specified base model (Qwen/Qwen3-4B-Instruct-2507) separately and then apply this adapter using the peft library. Compliance with the MIT license for the datasets and the base model's original terms of use is required.