thetmon/c23

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 26, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The thetmon/c23 is a 4 billion parameter LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507, designed to enhance multi-turn agent task performance. This adapter specifically improves capabilities in household tasks (ALFWorld) and database operations (DBBench) by optimizing environment observation, action selection, tool use, and error recovery. It leverages LoRA with full precision base training and a 32768 token context length, making it suitable for complex, multi-step agentic workflows.

Loading preview...

Overview

The thetmon/c23 is a LoRA adapter for the Qwen/Qwen3-4B-Instruct-2507 base model, developed by thetmon. This adapter, with 4 billion parameters, is specifically fine-tuned to improve multi-turn agent task performance across diverse domains.

Key Capabilities

  • Enhanced Agentic Reasoning: Optimized for complex, multi-step tasks requiring sequential decision-making.
  • Multi-turn Task Proficiency: Excels in scenarios where the agent needs to interact with an environment over multiple turns, such as household tasks (ALFWorld) and database operations (DBBench).
  • Error Recovery: Training includes loss application to all assistant turns, enabling the model to learn from and recover from errors within a trajectory.
  • Tool Use Integration: Designed to facilitate effective tool use and action selection based on environmental observations.

Training Details

The adapter was trained using LoRA (r=64, alpha=128) on a full precision base model, with a maximum sequence length of 4096 tokens over 3 epochs. The training data combined u-10bei/sft_alfworld_trajectory_dataset_v5 and u-10bei/dbbench_sft_dataset_react_v4.

Good For

  • Developing AI agents that require robust multi-turn interaction.
  • Applications involving complex task automation in simulated or real-world environments.
  • Scenarios demanding improved tool use and error handling in agentic workflows.