satoyutaka/Qwen2.5-7B-AgentBench-V4-BF16

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 28, 2026Architecture:Transformer Cold

satoyutaka/Qwen2.5-7B-AgentBench-V4-BF16 is a 7.6 billion parameter agent model, based on the Qwen2.5-7B-Instruct architecture, developed by satoyutaka. This model is specifically fine-tuned for extreme accuracy and long-context understanding in the AgentBench-comp evaluation environment, featuring an extended context length of 4096 tokens. It excels at complex multi-step tasks, particularly SQL aggregation commands and intricate ALFWorld navigation, through strict data curation and optimized training.

Loading preview...

Overview

satoyutaka/Qwen2.5-7B-AgentBench-V4-BF16 is an advanced agent model, built upon the Qwen2.5-7B-Instruct architecture, specifically engineered for the AgentBench-comp evaluation environment. This V4 variant prioritizes extreme accuracy and long-context understanding, aiming to resolve complex multi-step tasks without generation errors.

Key Enhancements from V3

  • Extended Context Length: Increased from 2048 to 4096 tokens, enabling the model to handle longer ALFWorld trajectories and capture extensive trial-and-error processes.
  • "Iron Guard" Protocol Dataset: Trained exclusively on 171 meticulously curated, high-quality trajectories to eliminate hallucinations and formatting errors, replacing the standard SFT dataset.
  • Optimized Training Method: Utilizes an optimized SFT approach with batch_size=1, grad_accumulation=4, and disabled validation to maximize learning efficiency on the curated data, achieving a loss of 0.192.
  • Targeted Logic: Heavily focuses on specific complex planning tasks, including SQL aggregation commands (SUM/COUNT) and intricate ALFWorld navigation patterns, with strategically added Japanese logic (JP Spice).

Ideal Use Cases

  • AgentBench-comp Evaluation: Designed and optimized for performance within this specific competitive environment.
  • Complex Multi-step Reasoning: Excels in tasks requiring sequential decision-making and adherence to strict instructions over long contexts.
  • SQL Aggregation and Navigation: Particularly strong in scenarios involving database queries with aggregation and complex environmental exploration like ALFWorld.