Overview
This model, kamaboko2007/llm_advance_024_enhanced_rules, is a 4 billion parameter Qwen3-4B-Instruct-2507 based model specifically fine-tuned to achieve high performance on AgentBench tasks, particularly ALFWorld and DBBench. It tackles common issues in multi-task agent fine-tuning, such as catastrophic forgetting and format-collision, through an innovative approach.
Key Innovation: Jinja2 Contextual Routing & Heuristics Injection
The core differentiator of this model lies in its custom tokenizer_config.json, which leverages Jinja2 to dynamically modify the chat_template. This system acts as an "Absolute Defense Shield" and "Dynamic Heuristics Injector," intercepting user prompts and injecting task-specific system prompts or "Cheat Sheets" just before inference. This allows the model to adapt its behavior based on the detected task.
Task-Specific Enhancements:
- DB Bench (MySQL) Mode: Automatically detects
MySQL or SQL in prompts and injects rules for error recovery (e.g., using DESCRIBE table_name; on SQL errors) and loop prevention. - ALFWorld (Household) Mode: Detects
household or Interact with a and enforces a stable Think:/Act: format, overriding evaluation system traps. It also injects exploration logic, such as analyzing failed actions and avoiding re-searching empty receptacles.
Training Configuration
The model was trained using LoRA on a highly curated "Golden Ratio" dataset consisting of 494 high-quality ALFWorld v5 Trajectories and DBBench Distilled trajectories. Loss was applied strictly to all assistant turns in multi-turn trajectories. Key hyperparameters included a max sequence length of 8192, 2 epochs, and a learning rate of 1e-6.
Usage
To leverage the model's unique capabilities, it is critical to use its customized tokenizer, as the Jinja2 chat_template is integral to its dynamic behavior.