Crownelius/Crow-4B-Opus-4.6-Distill-Heretic_Qwen3.5
Crow-4B-Opus-4.6-Distill-Heretic_Qwen3.5 is a 4.5 billion parameter distilled language model developed by Crownelius, built on the Qwen 3.5 architecture with a 32768 token context length. It is meticulously distilled from Claude Opus 4.6, capturing its deep reasoning, nuanced formatting, and instruction-following capabilities. This ultra-compact model is designed for efficient deployment on consumer hardware, excelling in diverse tasks including reasoning, creative writing, agentic coding, and security research.
Loading preview...
Model Overview
Crownelius's Crow-4B-Opus-4.6-Distill-Heretic_Qwen3.5 is a 4.5 billion parameter language model built on the robust Qwen 3.5 architecture. This model is a distillation of Claude Opus 4.6, aiming to replicate its advanced reasoning, detailed formatting, and strong instruction-following abilities in a significantly smaller footprint. It maintains a large context window of 32768 tokens, characteristic of the Qwen 3.5 backbone.
Key Capabilities
- Distilled Intelligence: Inherits the sophisticated reasoning and instruction-following from Claude Opus 4.6.
- Ultra-Compact Efficiency: Operates effectively on consumer GPUs and CPUs, including laptops and edge devices, due to its 4.5B parameter size.
- Broad Task Proficiency: Trained on 15 diverse datasets, encompassing over 25,000 examples across categories like reasoning, creative writing, agentic coding, security research, and roleplay.
- Multilingual Support: Benefits from the Qwen 3.5 architecture's inherent multilingual capabilities.
Training Details
The model was trained using Unsloth and TRL SFTTrainer, with a base model of tvall43/Qwen3.5-4B-heretic. It utilized a LoRA rank of r=32, α=32, a learning rate of 2e-4 (cosine schedule), and was trained for 1 epoch on an NVIDIA A100 40GB GPU. The training incorporated a maximum sequence length of 2048 tokens, drawing from a comprehensive mix of datasets to ensure broad applicability.