yasserrmd/glm5.1-distill

TEXT GENERATIONConcurrency Cost:1Model Size:1.2BQuant:BF16Ctx Length:32kPublished:Apr 30, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The yasserrmd/glm5.1-distill is a 1.2 billion parameter instruction-tuned chat model developed by Mohamed Yasser. Built on LiquidAI's LFM2.5-1.2B-Base architecture, it is supervised-fine-tuned on a 50k subset of reasoning-style chat data distilled from the GLM-5.1 family. This model is optimized to bring conversational reasoning to small, efficient architectures, making it suitable for on-device and edge deployments requiring lightweight reasoning capabilities.

Loading preview...

Overview

yasserrmd/glm5.1-distill is a 1.2 billion parameter instruction-tuned chat model, independently fine-tuned by Mohamed Yasser. It leverages the efficient LFM2.5-1.2B-Base architecture from LiquidAI and is trained on a 50k subset of the GLM-5.1-Reasoning-1M-Cleaned dataset, which contains reasoning-style chat data distilled from larger GLM-5.1 models. The primary goal of this distillation is to enable conversational reasoning behavior in a compact model that can run on consumer GPUs, edge devices, or via quantized runtimes.

Key Capabilities

  • Lightweight Reasoning: Designed for general assistant-style chat with a focus on step-by-step answers and explanations.
  • Efficient Architecture: Built on the LFM2 (hybrid conv + attention) architecture, making it suitable for resource-constrained environments.
  • Instruction-Tuned: Supervised-fine-tuned (SFT) to follow instructions and engage in chat-based interactions.
  • Flexible Deployment: Supports various deployment methods including ONNX, GGUF, or MLX for optimized inference.

Intended Use Cases

  • General Assistant Chat: Ideal for basic conversational tasks and answering questions.
  • On-Device/Edge Deployment: Excellent choice for applications where a small, efficient 1.2B parameter model is necessary.
  • Further Fine-tuning: Can serve as a strong base checkpoint for domain-specific fine-tuning.

Limitations

  • Inherits biases and limitations from its base model and training data.
  • Performance on complex reasoning, long-context tasks, or code generation will be weaker compared to larger models.
  • Primarily English-centric; performance in other languages may vary.
  • Not safety-aligned or production-ready; can confidently hallucinate facts.