Crownelius/Crow-4B-Opus-4.6-Distill-Heretic_Qwen3.5

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 3, 2026Architecture:Transformer0.1K Cold

Crow-4B-Opus-4.6-Distill-Heretic_Qwen3.5 is a 4.5 billion parameter distilled language model developed by Crownelius, built on the Qwen 3.5 architecture with a 32768 token context length. It is meticulously distilled from Claude Opus 4.6, capturing its deep reasoning, nuanced formatting, and instruction-following capabilities. This ultra-compact model is designed for efficient deployment on consumer hardware, excelling in diverse tasks including reasoning, creative writing, agentic coding, and security research.

Loading preview...

Model Overview

Crownelius's Crow-4B-Opus-4.6-Distill-Heretic_Qwen3.5 is a 4.5 billion parameter language model built on the robust Qwen 3.5 architecture. This model is a distillation of Claude Opus 4.6, aiming to replicate its advanced reasoning, detailed formatting, and strong instruction-following abilities in a significantly smaller footprint. It maintains a large context window of 32768 tokens, characteristic of the Qwen 3.5 backbone.

Key Capabilities

  • Distilled Intelligence: Inherits the sophisticated reasoning and instruction-following from Claude Opus 4.6.
  • Ultra-Compact Efficiency: Operates effectively on consumer GPUs and CPUs, including laptops and edge devices, due to its 4.5B parameter size.
  • Broad Task Proficiency: Trained on 15 diverse datasets, encompassing over 25,000 examples across categories like reasoning, creative writing, agentic coding, security research, and roleplay.
  • Multilingual Support: Benefits from the Qwen 3.5 architecture's inherent multilingual capabilities.

Training Details

The model was trained using Unsloth and TRL SFTTrainer, with a base model of tvall43/Qwen3.5-4B-heretic. It utilized a LoRA rank of r=32, α=32, a learning rate of 2e-4 (cosine schedule), and was trained for 1 epoch on an NVIDIA A100 40GB GPU. The training incorporated a maximum sequence length of 2048 tokens, drawing from a comprehensive mix of datasets to ensure broad applicability.