allenai/tmax-4b

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 17, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The allenai/tmax-4b is a 4.5 billion parameter language model developed by Ai2, fine-tuned from Qwen 3.5 4B using DPPO. It is specifically optimized as a terminal-agent, achieving 18.9% on Terminal Bench 2.0 (daytona) and 42.6% on TB Lite. This model excels at executing commands and interacting within a terminal environment, making it suitable for automation and agentic tasks.

Loading preview...

TMax 4B: A Specialized Terminal Agent

TMax 4B, developed by Ai2, is a 4.5 billion parameter model fine-tuned from Qwen 3.5 4B using Deep Proximal Policy Optimization (DPPO). Its primary design goal is to function as an effective terminal-agent, capable of interacting with and executing commands within a terminal environment.

Key Capabilities & Performance

This model demonstrates significant performance improvements over its base model in terminal-based tasks. After 200 steps of RL training, TMax 4B achieved:

  • 42.6% on TB Lite, compared to 31.8% for Qwen 3.5 4B.
  • 19.9% on TB 2.1, outperforming Qwen 3.5 4B.
  • 18.9% on Terminal Bench 2.0 (daytona), showing an improvement over Qwen 3.5 4B's 16.6%.

The model was trained on the TMax-15k dataset and is part of a larger collection of terminal agents. It supports a maximum overall token length of 65536, with a maximum per-turn token limit of 16384.

When to Use TMax 4B

  • Terminal Automation: Ideal for tasks requiring automated interaction with command-line interfaces.
  • Agentic Workflows: Suitable for building agents that need to execute commands, navigate systems, or perform operations within a terminal.
  • Research in RL for Agents: Provides a strong baseline for further research into reinforcement learning applications for language models in agentic settings.

Note that the vision head was removed during training, so this model is intended for language-only tasks.