allenai/tmax-9b
allenai/tmax-9b is a 9 billion parameter language model developed by Ai2, fine-tuned from Qwen 3.5 9B using DPPO. It is specifically optimized as a terminal-agent, demonstrating improved performance on Terminal Bench 2.0, achieving approximately 27% after 200 steps of RL training. This model excels at executing commands and interacting within a terminal environment, making it suitable for automated system administration and development tasks.
Loading preview...
TMax 9B: A Specialized Terminal Agent
TMax 9B, developed by Ai2, is a 9 billion parameter model fine-tuned from Qwen 3.5 9B. Its primary distinction lies in its optimization as a terminal-agent, achieved through 200 steps of DPPO (Distributed Proximal Policy Optimization) training on the TMax-15k dataset.
Key Capabilities & Performance
- Terminal Interaction: Designed to operate effectively within a terminal environment, capable of executing commands and responding to system outputs.
- Enhanced Agent Performance: Achieves approximately 27% on Terminal Bench 2.0, representing a significant improvement of ~6 points over its base model, Qwen 3.5 9B, in terminal-based tasks.
- DPPO Training: Utilizes DPPO for reinforcement learning, focusing on agentic capabilities rather than general language generation.
- Part of a Model Collection: TMax 9B is one of several terminal agents released by allenai, offering various sizes for different computational needs.
Use Cases & Recommendations
- Automated System Administration: Ideal for tasks requiring automated command execution and interaction with operating system shells.
- Development & Testing: Can be employed in automated testing environments or for scripting complex development workflows.
- Research in Agentic LLMs: Provides a strong baseline for further research into language models acting as autonomous agents in technical environments.
For detailed evaluation methodology and further insights, refer to the TMax paper and the codebase.