Overview

This repository provides a LoRA adapter (r=64) fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It focuses on improving the base model's capabilities in multi-turn agent tasks by applying loss to all assistant turns in a trajectory. This approach enables the model to learn from environment observations, make action selections, utilize tools, and recover from errors effectively.

Key Capabilities

Enhanced Agentic Performance: Specifically trained to improve performance in complex, multi-step agent tasks.
Multi-turn Task Specialization: Excels in scenarios requiring sequential decision-making and interaction.
Domain-Specific Improvement: Fine-tuned on datasets for household tasks (ALFWorld) and database operations (DBBench).
Error Recovery: Designed to learn from and recover from errors within multi-turn trajectories.

Training Details

The adapter was trained using LoRA (full precision base) with a maximum sequence length of 4096 over 3 epochs. It utilized a learning rate of 2e-04 and LoRA parameters r=64, alpha=128. The training data included u-10bei/sft_alfworld_trajectory_dataset_v5 and u-10bei/dbbench_sft_dataset_react_v4, both distributed under the MIT License.

Usage

Users can integrate this adapter with the base Qwen3-4B-Instruct model using the peft library, loading the base model and then applying the adapter weights.

Overview

Overview

Key Capabilities

Training Details

Usage

Full Model Card (README)