Name: satoyutaka/Qwen2.5-7B-AgentBench-V4-BF16 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: satoyutaka

Overview

satoyutaka/Qwen2.5-7B-AgentBench-V4-BF16 is an advanced agent model, built upon the Qwen2.5-7B-Instruct architecture, specifically engineered for the AgentBench-comp evaluation environment. This V4 variant prioritizes extreme accuracy and long-context understanding, aiming to resolve complex multi-step tasks without generation errors.

Key Enhancements from V3

Extended Context Length: Increased from 2048 to 4096 tokens, enabling the model to handle longer ALFWorld trajectories and capture extensive trial-and-error processes.
"Iron Guard" Protocol Dataset: Trained exclusively on 171 meticulously curated, high-quality trajectories to eliminate hallucinations and formatting errors, replacing the standard SFT dataset.
Optimized Training Method: Utilizes an optimized SFT approach with batch_size=1, grad_accumulation=4, and disabled validation to maximize learning efficiency on the curated data, achieving a loss of 0.192.
Targeted Logic: Heavily focuses on specific complex planning tasks, including SQL aggregation commands (SUM/COUNT) and intricate ALFWorld navigation patterns, with strategically added Japanese logic (JP Spice).

Ideal Use Cases

AgentBench-comp Evaluation: Designed and optimized for performance within this specific competitive environment.
Complex Multi-step Reasoning: Excels in tasks requiring sequential decision-making and adherence to strict instructions over long contexts.
SQL Aggregation and Navigation: Particularly strong in scenarios involving database queries with aggregation and complex environmental exploration like ALFWorld.

Overview

Overview

Key Enhancements from V3

Ideal Use Cases

Full Model Card (README)