Overview

This repository provides a LoRA adapter (r=64) fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It is designed to enhance the base model's capabilities in complex, multi-turn agent tasks.

Key Capabilities

Improved Agent Task Performance: Specifically trained on ALFWorld (household tasks) and DBBench (database operations) datasets.
Multi-Turn Trajectory Learning: The training objective applies loss to all assistant turns, enabling the model to learn from environment observations, action selection, tool use, and error recovery within a sequence of interactions.
LoRA Fine-tuning: Utilizes LoRA with a full precision base model, configured with r=64 and alpha=128, over 3 epochs.

Training Details

Base Model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA (full precision base) with Unsloth
Max Sequence Length: 4096 tokens
Learning Rate: 2e-04
Training Data: Combines u-10bei/sft_alfworld_trajectory_dataset_v5 and u-10bei/dbbench_sft_dataset_react_v4, both licensed under MIT.

Usage Notes

This repository contains LoRA adapter weights only. Users must load the specified base model (Qwen/Qwen3-4B-Instruct-2507) separately and then apply this adapter using the peft library. Compliance with the MIT license for the datasets and the base model's original terms of use is required.

Overview

Overview

Key Capabilities

Training Details

Usage Notes

Full Model Card (README)