allenai/tmax-2b
allenai/tmax-2b is a 2.3 billion parameter language model developed by AllenAI, fine-tuned from Qwen 3.5 2B using DPPO. It is specifically optimized as a terminal agent, demonstrating enhanced performance on Terminal Bench 2.0 and TB Lite benchmarks compared to its base model. This model excels at executing commands and interacting within a terminal environment, making it suitable for automated scripting and agentic workflows.
Loading preview...
TMax 2B: A Specialized Terminal Agent
TMax 2B, developed by AllenAI, is a 2.3 billion parameter model specifically fine-tuned from Qwen 3.5 2B using Deep Proximal Policy Optimization (DPPO) to function as a terminal agent. This model is designed to interact with and execute commands within a terminal environment, making it highly suitable for automated tasks and agentic applications.
Key Capabilities & Performance
- Terminal Agent Specialization: TMax 2B is explicitly trained for terminal-based interactions, demonstrating improved performance on relevant benchmarks.
- Enhanced Benchmark Scores: It significantly outperforms its base model, Qwen 3.5 2B, on the Terminal Bench (TB) Lite, TB 2.1, and TB 2.0 (daytona) evaluations. For instance, TMax 2B achieves 11.8 +/- 1.4 on TB Lite compared to Qwen 3.5 2B's 5.71 +/- 1.6.
- DPPO Fine-tuning: The model leverages DPPO for reinforcement learning, with the main checkpoint being from 100 steps of RL training, which showed optimal performance on TBLite.
- Context Length: Supports a maximum overall token length of 65536, with a max per-turn token limit of 16384.
Use Cases & Considerations
- Automated Scripting: Ideal for scenarios requiring an AI to understand and execute terminal commands.
- Agentic Workflows: Can be integrated into systems that need an agent to interact with operating system shells or command-line interfaces.
- Research & Development: Useful for researchers exploring reinforcement learning for agent control in terminal environments. The model's training details, including hyperparameters and dataset (TMax-15k), are openly provided.
- No Vision Head: The vision head was removed during training, so it is intended for language-model-only use.
For more in-depth technical details, refer to the TMax paper.