EnvScaler-Qwen3-4B: Tool-Enhanced Agent Model
EnvScaler-Qwen3-4B is a 4 billion parameter language model built upon the Qwen3-4B (Thinking Mode) architecture, developed by XXHStudyHard. Its core distinction lies in its specialized training using the EnvScaler framework, which focuses on enhancing the model's capabilities for tool-interactive agent tasks.
Key Capabilities & Training:
- Tool Interaction: Designed to perform complex tasks by interacting with external tools and environments.
- Two-Stage Training: Undergoes a rigorous two-stage training process:
- Supervised Fine-Tuning (SFT): Trained on 9,022 trajectories from agent-environment interactions, utilizing 4,684 SFT scenarios and 141 synthesized environments from datasets like EnvScaler-SFT-Traj-9K.
- Reinforcement Learning (RL): Further refined using 2,550 RL scenarios and 50 synthesized environments, based on the ROLL framework.
- Context Length: Supports a substantial context window of 40960 tokens.
Ideal Use Cases:
- Agent Development: Suitable for building and experimenting with AI agents that require dynamic tool use.
- Interactive Environments: Excels in scenarios where the model needs to interact with and learn from simulated or real-world environments.
- Complex Problem Solving: Applicable to tasks that benefit from a model's ability to leverage external tools for problem-solving, rather than relying solely on internal knowledge.