LLM-OS-Models/Qwen3.5-9B-Terminal-SFT-2Epoch-FullFT-2BData
The LLM-OS-Models/Qwen3.5-9B-Terminal-SFT-2Epoch-FullFT-2BData is a 9 billion parameter Qwen3.5-based model fine-tuned for terminal automation. It is specifically trained to generate the next terminal command in JSON format based on input tasks and previous terminal states. This model excels at automating terminal operations and is optimized for stability in command reproduction.
Loading preview...
Overview
This model, LLM-OS-Models/Qwen3.5-9B-Terminal-SFT-2Epoch-FullFT-2BData, is a 9 billion parameter Qwen3.5-based language model specifically fine-tuned for terminal automation. It is designed to interpret user tasks and current terminal states, then generate the appropriate next command in a structured JSON format. The model underwent full fine-tuning over 2 epochs using a 2BData setting.
Key Capabilities
- Terminal Command Generation: Generates the next terminal command as JSON, facilitating automation of command-line tasks.
- High Stability in Command Reproduction: Achieves a high score on the corrected TB2-lite benchmark, indicating reliable command generation.
- Conservative Command Output: Tends to output correct commands conservatively rather than many incorrect ones.
- Fast Inference: Operates at approximately 0.293 seconds per step, making it efficient for automated workflows.
Performance Highlights
Evaluated on the corrected TB2-lite replay set, this model ranks highly (3 out of 56) with a score of 38.26 (based on 100 * avg_command_f1). It demonstrates strong command F1 (0.3826) and command precision (0.4620), with 64.4% valid JSON output.
Limitations and Considerations
- Lower Recall: May occasionally omit necessary commands.
- JSON Format Failures: Requires parsing validation and potential retries due to occasional JSON formatting errors.
- Specialized Use: This model is an SFT model for terminal operation assistance and does not guarantee general conversational or reasoning performance.
- Safety: Generated commands should always be executed within a sandbox, allowlist, or with human review for safety.