TMax 4B: A Specialized Terminal Agent

TMax 4B, developed by Ai2, is a 4.5 billion parameter model fine-tuned from Qwen 3.5 4B using Deep Proximal Policy Optimization (DPPO). Its primary design goal is to function as an effective terminal-agent, capable of interacting with and executing commands within a terminal environment.

Key Capabilities & Performance

This model demonstrates significant performance improvements over its base model in terminal-based tasks. After 200 steps of RL training, TMax 4B achieved:

42.6% on TB Lite, compared to 31.8% for Qwen 3.5 4B.
19.9% on TB 2.1, outperforming Qwen 3.5 4B.
18.9% on Terminal Bench 2.0 (daytona), showing an improvement over Qwen 3.5 4B's 16.6%.

The model was trained on the TMax-15k dataset and is part of a larger collection of terminal agents. It supports a maximum overall token length of 65536, with a maximum per-turn token limit of 16384.

When to Use TMax 4B

Terminal Automation: Ideal for tasks requiring automated interaction with command-line interfaces.
Agentic Workflows: Suitable for building agents that need to execute commands, navigate systems, or perform operations within a terminal.
Research in RL for Agents: Provides a strong baseline for further research into reinforcement learning applications for language models in agentic settings.

Note that the vision head was removed during training, so this model is intended for language-only tasks.

Overview

TMax 4B: A Specialized Terminal Agent

Key Capabilities & Performance

When to Use TMax 4B

Full Model Card (README)