LLM-OS-Models/LFM2.5-1.2B-Terminal-SFT-2Epoch-LiquidCLI-TemplateHoldout
LLM-OS-Models/LFM2.5-1.2B-Terminal-SFT-2Epoch-LiquidCLI-TemplateHoldout is a 1.2 billion parameter instruction-tuned model based on LiquidAI/LFM2.5-1.2B-Instruct, specifically fine-tuned for terminal automation. It generates JSON-formatted commands based on user input and previous terminal states, excelling at predicting the next action in a terminal environment. With a 32768 token context length, this model is optimized for efficient inference in terminal operation assistance, prioritizing conservative and accurate command generation over high recall.
Loading preview...
Overview
LLM-OS-Models/LFM2.5-1.2B-Terminal-SFT-2Epoch-LiquidCLI-TemplateHoldout is a 1.2 billion parameter model derived from LiquidAI/LFM2.5-1.2B-Instruct, specifically trained for terminal automation. It processes user requests and terminal states to output the next command in a structured JSON format. The model was trained over 2 epochs using Liquid-CLI style preprocessing and a chat-template aligned holdout split.
Key Capabilities
- Terminal Automation: Generates JSON-formatted commands for terminal operations based on input tasks and prior terminal states.
- Efficient Inference: Designed for cost-effective and fast inference, with a reported speed of
0.086seconds per step. - Conservative Command Generation: Tends to issue fewer incorrect commands, prioritizing accuracy over a high volume of suggestions.
- Structured Output: Produces commands within a recommended JSON format including
analysis,plan,commands, andtask_completefields.
Evaluation Highlights
Evaluated on the corrected TB2-lite replay set, the model achieved a Command F1 score of 0.2864 and a 29.0% first command exact percentage. While its overall rank is 31 / 56, it demonstrates strong potential as an efficient candidate for iterative evaluation and reinforcement learning experiments due to its low sec/step and significant SFT performance uplift from its base model.
Limitations
- Lower Recall: May omit some necessary commands due to a relatively low recall score.
- JSON Format Failures: Requires parsing validation and potential retries as
50.5%of JSON outputs were valid in evaluation. - Specialized Use: This model is specifically for terminal automation and does not guarantee general conversation or reasoning performance. Generated commands require safety measures like sandboxing or human review before execution.