ichi234/exp002_stage2_s2_db_merged

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 1, 2026Architecture:Transformer Cold

The ichi234/exp002_stage2_s2_db_merged model is a 7.6 billion parameter language model based on Qwen2.5-7B-Instruct, fine-tuned for advanced competitive tasks. It is specifically optimized to stabilize output formats and maintain legal action rates for ALFWorld environments, and to improve SQL/answer consistency for DBBench tasks. This model excels at agentic reasoning in structured environments, balancing performance across both ALFWorld's THOUGHT+ACTION format and DBBench's Action: Operation/Answer format. It leverages a multi-stage LoRA fine-tuning process and offline distillation using openai/gpt-oss-120b to achieve its specialized capabilities.

Loading preview...

Model Overview

ichi234/exp002_stage2_s2_db_merged is a 7.6 billion parameter language model derived from Qwen/Qwen2.5-7B-Instruct. It has been specifically fine-tuned to excel in advanced competitive environments, particularly ALFWorld and DBBench tasks.

Key Capabilities

  • ALFWorld Optimization: Stabilizes THOUGHT + ACTION two-line output format and maintains a high rate of legal actions within ALFWorld environments.
  • DBBench Enhancement: Improves the stability of Action: Operation / Action: Answer formats and enhances the consistency of generated SQL queries and answers for DBBench.
  • Multi-stage Fine-tuning: Utilizes a multi-phase LoRA (bfloat16) training strategy, with varying learning rates and epochs, to progressively refine performance across different aspects of the target tasks.
  • Data Augmentation: Incorporates offline distillation using openai/gpt-oss-120b to expand and enhance the training dataset, contributing to improved model robustness and accuracy.

Training Methodology

The model underwent a two-stage training process. Stage 1 focused on stabilizing ALFWorld output formats, followed by DBBench learning, using multiple LoRA phases with a maximum sequence length of 2048. Stage 2 leveraged data augmented via distillation from openai/gpt-oss-120b, training with a maximum sequence length of 4096 to further refine its specialized capabilities.

Good for

  • Developing agents for complex interactive environments like ALFWorld.
  • Applications requiring precise SQL generation and answer consistency in database interaction tasks (DBBench).
  • Research into multi-task learning and agentic behavior in structured environments.