Name: sarrington/qwen2.5-0.5b-spliced API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sarrington

Model Overview

The sarrington/qwen2.5-0.5b-spliced model is an optimized, 15-layer variant of the Qwen 2.5 0.5B architecture. This version has been surgically pruned from its original 16 layers to 15, focusing on efficiency and performance for specific hardware targets. It includes standard configurations and tokenizer support, ensuring seamless integration and execution in local environments.

Key Characteristics

Architecture: Spliced Qwen 2.5 0.5B, optimized to 15 layers.
Format: Available in Safetensors and GGUF formats, including Q4_K_M and IQ4_XS quantizations.
Size: The GGUF variant is 284 MB, while the IQ4_XS is 257 MB, making it very compact.
Context Length: Supports a substantial 32768 tokens.

Target Platforms & Use Cases

This model is specifically designed for efficient local execution on resource-constrained devices.

Target Platforms: Optimized for Apple Silicon MacBooks (M1/M2/M3/M4) and other standard CPU/GPU local runtimes.
Primary Use Case: Ideal for applications requiring a compact, performant language model that can run natively on mobile and edge devices, enabling on-device inference with reduced computational overhead.

Overview

Model Overview

Key Characteristics

Target Platforms & Use Cases

Full Model Card (README)