da1ch812/advanced-comp-model

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 16, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

da1ch812/advanced-comp-model is a 4 billion parameter Qwen3-based instruction-tuned language model, merged with a LoRA adapter. It is specifically fine-tuned to enhance multi-turn agent task performance, excelling in complex environments like household tasks (ALFWorld) and database operations (DBBench). This model is optimized for learning environment observation, action selection, tool use, and error recovery within agent trajectories.

Loading preview...

Overview

This model, da1ch812/advanced-comp-model, is a 4 billion parameter Qwen3-based instruction-tuned language model. It integrates the unsloth/Qwen3-4B-Instruct-2507 base model with a LoRA adapter, eliminating the need for separate LoRA loading. The primary objective of this adapter training was to significantly improve the model's capabilities in multi-turn agent tasks.

Key Capabilities

  • Enhanced Agent Performance: Specifically trained to excel in complex, multi-turn agent environments.
  • Task Domains: Demonstrated proficiency in household tasks (ALFWorld) and database operations (DBBench).
  • Trajectory Learning: The training process focused on applying loss to all assistant turns, enabling the model to learn:
    • Environment observation
    • Strategic action selection
    • Effective tool use
    • Robust error recovery mechanisms

Training Details

The model was fine-tuned using LoRA with a maximum sequence length of 1024 and a learning rate of 2e-06 over 1 epoch. The training data includes several versions of ALFWorld trajectory datasets and DBBench SFT datasets, all licensed under the MIT License. Users must comply with both the MIT license and the base model's original terms of use.