ichi234/exp002_stage2_s2_db_merged
The ichi234/exp002_stage2_s2_db_merged model is a 7.6 billion parameter language model based on Qwen2.5-7B-Instruct, fine-tuned for advanced competitive tasks. It is specifically optimized to stabilize output formats and maintain legal action rates for ALFWorld environments, and to improve SQL/answer consistency for DBBench tasks. This model excels at agentic reasoning in structured environments, balancing performance across both ALFWorld's THOUGHT+ACTION format and DBBench's Action: Operation/Answer format. It leverages a multi-stage LoRA fine-tuning process and offline distillation using openai/gpt-oss-120b to achieve its specialized capabilities.
Loading preview...
Model Overview
ichi234/exp002_stage2_s2_db_merged is a 7.6 billion parameter language model derived from Qwen/Qwen2.5-7B-Instruct. It has been specifically fine-tuned to excel in advanced competitive environments, particularly ALFWorld and DBBench tasks.
Key Capabilities
- ALFWorld Optimization: Stabilizes
THOUGHT+ACTIONtwo-line output format and maintains a high rate of legal actions within ALFWorld environments. - DBBench Enhancement: Improves the stability of
Action: Operation/Action: Answerformats and enhances the consistency of generated SQL queries and answers for DBBench. - Multi-stage Fine-tuning: Utilizes a multi-phase LoRA (bfloat16) training strategy, with varying learning rates and epochs, to progressively refine performance across different aspects of the target tasks.
- Data Augmentation: Incorporates offline distillation using
openai/gpt-oss-120bto expand and enhance the training dataset, contributing to improved model robustness and accuracy.
Training Methodology
The model underwent a two-stage training process. Stage 1 focused on stabilizing ALFWorld output formats, followed by DBBench learning, using multiple LoRA phases with a maximum sequence length of 2048. Stage 2 leveraged data augmented via distillation from openai/gpt-oss-120b, training with a maximum sequence length of 4096 to further refine its specialized capabilities.
Good for
- Developing agents for complex interactive environments like ALFWorld.
- Applications requiring precise SQL generation and answer consistency in database interaction tasks (DBBench).
- Research into multi-task learning and agentic behavior in structured environments.