thetmon/c15

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 23, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The thetmon/c15 is a 4 billion parameter LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507, designed to enhance multi-turn agent task performance. It specializes in household tasks (ALFWorld) and database operations (DBBench), learning environment observation, action selection, tool use, and error recovery. This adapter improves the base model's ability to handle complex, multi-step agentic workflows.

Loading preview...

Overview

This repository provides a LoRA adapter (r=64) fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It focuses on improving the base model's capabilities in multi-turn agent tasks by applying loss to all assistant turns in a trajectory. This approach enables the model to learn from environment observations, make action selections, utilize tools, and recover from errors effectively.

Key Capabilities

  • Enhanced Agentic Performance: Specifically trained to improve performance in complex, multi-step agent tasks.
  • Multi-turn Task Specialization: Excels in scenarios requiring sequential decision-making and interaction.
  • Domain-Specific Improvement: Fine-tuned on datasets for household tasks (ALFWorld) and database operations (DBBench).
  • Error Recovery: Designed to learn from and recover from errors within multi-turn trajectories.

Training Details

The adapter was trained using LoRA (full precision base) with a maximum sequence length of 4096 over 3 epochs. It utilized a learning rate of 2e-04 and LoRA parameters r=64, alpha=128. The training data included u-10bei/sft_alfworld_trajectory_dataset_v5 and u-10bei/dbbench_sft_dataset_react_v4, both distributed under the MIT License.

Usage

Users can integrate this adapter with the base Qwen3-4B-Instruct model using the peft library, loading the base model and then applying the adapter weights.