thetmon/c8

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 20, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The thetmon/c8 is a 4 billion parameter LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507. This adapter specializes in improving multi-turn agent task performance across household tasks (ALFWorld) and database operations (DBBench). It enhances the base model's ability to learn environment observation, action selection, tool use, and error recovery in complex agent trajectories.

Loading preview...

Overview

This repository provides a LoRA adapter (r=64) fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It is designed to enhance the base model's capabilities in complex, multi-turn agent tasks.

Key Capabilities

  • Improved Agent Task Performance: Specifically trained on ALFWorld (household tasks) and DBBench (database operations) datasets.
  • Multi-Turn Trajectory Learning: The training objective applies loss to all assistant turns, enabling the model to learn from environment observations, action selection, tool use, and error recovery within a sequence of interactions.
  • LoRA Fine-tuning: Utilizes LoRA with a full precision base model, configured with r=64 and alpha=128, over 3 epochs.

Training Details

  • Base Model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (full precision base) with Unsloth
  • Max Sequence Length: 4096 tokens
  • Learning Rate: 2e-04
  • Training Data: Combines u-10bei/sft_alfworld_trajectory_dataset_v5 and u-10bei/dbbench_sft_dataset_react_v4, both licensed under MIT.

Usage Notes

This repository contains LoRA adapter weights only. Users must load the specified base model (Qwen/Qwen3-4B-Instruct-2507) separately and then apply this adapter using the peft library. Compliance with the MIT license for the datasets and the base model's original terms of use is required.