allenai/tmax-2b

VISIONConcurrency Cost:1Model Size:2.3BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 17, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

allenai/tmax-2b is a 2.3 billion parameter language model developed by AllenAI, fine-tuned from Qwen 3.5 2B using DPPO. It is specifically optimized as a terminal agent, demonstrating enhanced performance on Terminal Bench 2.0 and TB Lite benchmarks compared to its base model. This model excels at executing commands and interacting within a terminal environment, making it suitable for automated scripting and agentic workflows.

Loading preview...

TMax 2B: A Specialized Terminal Agent

TMax 2B, developed by AllenAI, is a 2.3 billion parameter model specifically fine-tuned from Qwen 3.5 2B using Deep Proximal Policy Optimization (DPPO) to function as a terminal agent. This model is designed to interact with and execute commands within a terminal environment, making it highly suitable for automated tasks and agentic applications.

Key Capabilities & Performance

  • Terminal Agent Specialization: TMax 2B is explicitly trained for terminal-based interactions, demonstrating improved performance on relevant benchmarks.
  • Enhanced Benchmark Scores: It significantly outperforms its base model, Qwen 3.5 2B, on the Terminal Bench (TB) Lite, TB 2.1, and TB 2.0 (daytona) evaluations. For instance, TMax 2B achieves 11.8 +/- 1.4 on TB Lite compared to Qwen 3.5 2B's 5.71 +/- 1.6.
  • DPPO Fine-tuning: The model leverages DPPO for reinforcement learning, with the main checkpoint being from 100 steps of RL training, which showed optimal performance on TBLite.
  • Context Length: Supports a maximum overall token length of 65536, with a max per-turn token limit of 16384.

Use Cases & Considerations

  • Automated Scripting: Ideal for scenarios requiring an AI to understand and execute terminal commands.
  • Agentic Workflows: Can be integrated into systems that need an agent to interact with operating system shells or command-line interfaces.
  • Research & Development: Useful for researchers exploring reinforcement learning for agent control in terminal environments. The model's training details, including hyperparameters and dataset (TMax-15k), are openly provided.
  • No Vision Head: The vision head was removed during training, so it is intended for language-model-only use.

For more in-depth technical details, refer to the TMax paper.