sarrington/qwen2.5-0.5b-spliced

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The sarrington/qwen2.5-0.5b-spliced model is a 0.5 billion parameter variant of the Qwen 2.5 architecture, developed by sarrington. This model has been surgically pruned from 16 to 15 layers, resulting in a highly optimized, mixed-precision design. It is specifically engineered for efficient local execution on mobile and edge platforms, including Apple Silicon Macs, with a notable context length of 32768 tokens.

Loading preview...

Model Overview

The sarrington/qwen2.5-0.5b-spliced model is an optimized, 15-layer variant of the Qwen 2.5 0.5B architecture. This version has been surgically pruned from its original 16 layers to 15, focusing on efficiency and performance for specific hardware targets. It includes standard configurations and tokenizer support, ensuring seamless integration and execution in local environments.

Key Characteristics

  • Architecture: Spliced Qwen 2.5 0.5B, optimized to 15 layers.
  • Format: Available in Safetensors and GGUF formats, including Q4_K_M and IQ4_XS quantizations.
  • Size: The GGUF variant is 284 MB, while the IQ4_XS is 257 MB, making it very compact.
  • Context Length: Supports a substantial 32768 tokens.

Target Platforms & Use Cases

This model is specifically designed for efficient local execution on resource-constrained devices.

  • Target Platforms: Optimized for Apple Silicon MacBooks (M1/M2/M3/M4) and other standard CPU/GPU local runtimes.
  • Primary Use Case: Ideal for applications requiring a compact, performant language model that can run natively on mobile and edge devices, enabling on-device inference with reduced computational overhead.