LLM-OS-Models/LFM2.5-1.2B-Terminal-SFT-2Epoch-LiquidCLI-TemplateHoldout

TEXT GENERATIONConcurrency Cost:1Model Size:1.2BQuant:BF16Ctx Length:32kPublished:May 7, 2026Architecture:Transformer Cold

LLM-OS-Models/LFM2.5-1.2B-Terminal-SFT-2Epoch-LiquidCLI-TemplateHoldout is a 1.2 billion parameter instruction-tuned model based on LiquidAI/LFM2.5-1.2B-Instruct, specifically fine-tuned for terminal automation. It generates JSON-formatted commands based on user input and previous terminal states, excelling at predicting the next action in a terminal environment. With a 32768 token context length, this model is optimized for efficient inference in terminal operation assistance, prioritizing conservative and accurate command generation over high recall.

Loading preview...

Overview

LLM-OS-Models/LFM2.5-1.2B-Terminal-SFT-2Epoch-LiquidCLI-TemplateHoldout is a 1.2 billion parameter model derived from LiquidAI/LFM2.5-1.2B-Instruct, specifically trained for terminal automation. It processes user requests and terminal states to output the next command in a structured JSON format. The model was trained over 2 epochs using Liquid-CLI style preprocessing and a chat-template aligned holdout split.

Key Capabilities

  • Terminal Automation: Generates JSON-formatted commands for terminal operations based on input tasks and prior terminal states.
  • Efficient Inference: Designed for cost-effective and fast inference, with a reported speed of 0.086 seconds per step.
  • Conservative Command Generation: Tends to issue fewer incorrect commands, prioritizing accuracy over a high volume of suggestions.
  • Structured Output: Produces commands within a recommended JSON format including analysis, plan, commands, and task_complete fields.

Evaluation Highlights

Evaluated on the corrected TB2-lite replay set, the model achieved a Command F1 score of 0.2864 and a 29.0% first command exact percentage. While its overall rank is 31 / 56, it demonstrates strong potential as an efficient candidate for iterative evaluation and reinforcement learning experiments due to its low sec/step and significant SFT performance uplift from its base model.

Limitations

  • Lower Recall: May omit some necessary commands due to a relatively low recall score.
  • JSON Format Failures: Requires parsing validation and potential retries as 50.5% of JSON outputs were valid in evaluation.
  • Specialized Use: This model is specifically for terminal automation and does not guarantee general conversation or reasoning performance. Generated commands require safety measures like sandboxing or human review before execution.