yasserrmd/glm5.1-distill
The yasserrmd/glm5.1-distill is a 1.2 billion parameter instruction-tuned chat model developed by Mohamed Yasser. Built on LiquidAI's LFM2.5-1.2B-Base architecture, it is supervised-fine-tuned on a 50k subset of reasoning-style chat data distilled from the GLM-5.1 family. This model is optimized to bring conversational reasoning to small, efficient architectures, making it suitable for on-device and edge deployments requiring lightweight reasoning capabilities.
Loading preview...
Overview
yasserrmd/glm5.1-distill is a 1.2 billion parameter instruction-tuned chat model, independently fine-tuned by Mohamed Yasser. It leverages the efficient LFM2.5-1.2B-Base architecture from LiquidAI and is trained on a 50k subset of the GLM-5.1-Reasoning-1M-Cleaned dataset, which contains reasoning-style chat data distilled from larger GLM-5.1 models. The primary goal of this distillation is to enable conversational reasoning behavior in a compact model that can run on consumer GPUs, edge devices, or via quantized runtimes.
Key Capabilities
- Lightweight Reasoning: Designed for general assistant-style chat with a focus on step-by-step answers and explanations.
- Efficient Architecture: Built on the LFM2 (hybrid conv + attention) architecture, making it suitable for resource-constrained environments.
- Instruction-Tuned: Supervised-fine-tuned (SFT) to follow instructions and engage in chat-based interactions.
- Flexible Deployment: Supports various deployment methods including ONNX, GGUF, or MLX for optimized inference.
Intended Use Cases
- General Assistant Chat: Ideal for basic conversational tasks and answering questions.
- On-Device/Edge Deployment: Excellent choice for applications where a small, efficient 1.2B parameter model is necessary.
- Further Fine-tuning: Can serve as a strong base checkpoint for domain-specific fine-tuning.
Limitations
- Inherits biases and limitations from its base model and training data.
- Performance on complex reasoning, long-context tasks, or code generation will be weaker compared to larger models.
- Primarily English-centric; performance in other languages may vary.
- Not safety-aligned or production-ready; can confidently hallucinate facts.