kurai021/Llama-3.2-3B-Instruct-4bit-fable5
The kurai021/Llama-3.2-3B-Instruct-4bit-fable5 model is a 3.2 billion parameter Llama-3.2-Instruct variant, upcasted to FP16 from a 4-bit quantized base. It incorporates distillation traces from Fable 5, inducing an instruction-guided Chain of Thought (CoT) behavior via ... tags. This model is optimized to break down analytical logic before delivering a final response, making it suitable for tasks requiring detailed reasoning, particularly in English and Spanish.
Loading preview...
Model Overview
kurai021/Llama-3.2-3B-Instruct-4bit-fable5 is a 3.2 billion parameter instruction-tuned model, derived from mlx-community/Llama-3.2-3B-Instruct-4bit and upcasted to FP16 safetensors. Its core differentiator is the integration of Fable 5 distillation traces, which induce a unique Chain of Thought (CoT) behavior. This means the model is designed to articulate its analytical process within <think>...</think> tags before providing a final answer.
Key Capabilities & Features
- Instruction-Guided Chain of Thought (CoT): The model explicitly breaks down its reasoning process, simulating an internal monologue. This enhances transparency and can improve the quality of complex problem-solving.
- FP16 Precision: Despite originating from a 4-bit quantized base, the model has been upcasted to FP16, retaining impressive analytical capabilities.
- Multilingual Support: Primarily supports English and Spanish.
- Optimized Output Structure: Designed for UI environments like Open WebUI, where the analytical
<think>blocks can be automatically parsed and collapsed for a cleaner user experience, while remaining fully visible in raw streaming environments.
Performance Highlights
Evaluated on the GSM8K mathematical reasoning dataset (5-shot configuration), the model achieved a 54.51% exact match score when extracting the correct final answer from its reasoning stream (flexible-extract). While strict-match scores are lower due to custom formatting, its analytical retention is notable, performing impressively close to Meta's native BF16 model despite its quantized origin.
Ideal Use Cases
- Applications requiring transparent, step-by-step reasoning.
- Educational tools where understanding the thought process is as important as the final answer.
- Tasks benefiting from detailed problem analysis and logical breakdown.
- Environments where the structured output (with
<think>tags) can be leveraged for enhanced user interaction.