IcyFish/Qwen3-4B-EnvTuning

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 14, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

IcyFish/Qwen3-4B-EnvTuning is a 4 billion parameter causal language model, continued-trained by IcyFish on the Qwen3-4B-Instruct-2507 base. This model is specifically developed using an "Environment Tuning" paradigm, focusing on agent learning under data scarcity by enhancing exploration through structured curricula, actionable environment augmentation, and fine-grained progress rewards. It is optimized for multi-turn tool-use tasks, demonstrating improved out-of-distribution generalization compared to traditional fine-tuning methods.

Loading preview...

Overview

IcyFish/Qwen3-4B-EnvTuning is a 4 billion parameter causal language model, built upon the Qwen3-4B-Instruct-2507 base model. It implements the "Environment Tuning" paradigm, a novel approach to agent training that emphasizes environment-based exploration over static trajectory imitation, particularly effective under extreme data scarcity.

Key Capabilities & Training Philosophy

  • Environment Tuning: Shifts agent learning from policy fine-tuning to optimizing the learning environment itself.
  • Structured Curriculum: Trains agents from simple to complex multi-turn tool-use behaviors.
  • Actionable Environment Augmentation: Provides corrective hints for failures, revealing tool dependencies and constraints.
  • Fine-grained Progress Rewards: Offers denser, turn-level learning signals instead of sparse episode-level success metrics.
  • Improved Generalization: Designed to achieve better out-of-distribution generalization with limited training data.

Use Cases & Performance

This model is particularly suited for multi-turn tool-use settings where data is scarce. It aims to train competitive agents efficiently. While this specific checkpoint was not part of the original research paper, it follows the same training philosophy. Evaluation on 400 unseen BFCL V3 instances shows an overall accuracy of 63.50% across various multi-turn categories, including long context and handling missing functions/parameters. The model maintains the Qwen3 architecture with a native context length of 262,144 tokens.