Model Overview
The continuum-ai/qwen2.5-3b-general-forged model is a specialized version of the Qwen2.5-3B architecture, developed by continuum-ai. Its primary distinction lies in its optimization process: a 30% head pruning followed by retraining for general tasks using Experiential Plasticity. This methodology achieved a slight improvement in perplexity, moving from 2.30 to 2.29, while substantially reducing the model's size.
Key Characteristics
- Efficiency: Achieves a 30% reduction in parameters through head pruning, making it more compact.
- Performance: Maintains or slightly improves general perplexity despite significant pruning.
- Provenance: Features cryptographic provenance via the ForgeAlloy chain of custody, allowing for verifiable claims regarding its development and modifications.
- Methodology: Developed through a
prune → train pipeline over 3 cycles, with detailed methodology available in a companion paper.
Use Cases & Deployment
This model is suitable for general language understanding and generation tasks where efficiency and verifiable development are important. It is designed to run on various devices, including:
- MacBook Pro (16GB/32GB) in fp16 format.
- MacBook Air (16GB) in Q8_0 format.
- MacBook Air (8GB) and mobile devices (iPhone/Android) in Q4_K_M format, requiring approximately 2.5GB.
Developers can also use the Continuum framework to design and forge custom models, including context extension, pruning, LoRA, and quantization, tailored for specific hardware targets.