Name: kurai021/Llama-3.2-3B-Instruct-4bit-fable5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kurai021

Model Overview

kurai021/Llama-3.2-3B-Instruct-4bit-fable5 is a 3.2 billion parameter instruction-tuned model, derived from mlx-community/Llama-3.2-3B-Instruct-4bit and upcasted to FP16 safetensors. Its core differentiator is the integration of Fable 5 distillation traces, which induce a unique Chain of Thought (CoT) behavior. This means the model is designed to articulate its analytical process within <think>...</think> tags before providing a final answer.

Key Capabilities & Features

Instruction-Guided Chain of Thought (CoT): The model explicitly breaks down its reasoning process, simulating an internal monologue. This enhances transparency and can improve the quality of complex problem-solving.
FP16 Precision: Despite originating from a 4-bit quantized base, the model has been upcasted to FP16, retaining impressive analytical capabilities.
Multilingual Support: Primarily supports English and Spanish.
Optimized Output Structure: Designed for UI environments like Open WebUI, where the analytical <think> blocks can be automatically parsed and collapsed for a cleaner user experience, while remaining fully visible in raw streaming environments.

Performance Highlights

Evaluated on the GSM8K mathematical reasoning dataset (5-shot configuration), the model achieved a 54.51% exact match score when extracting the correct final answer from its reasoning stream (flexible-extract). While strict-match scores are lower due to custom formatting, its analytical retention is notable, performing impressively close to Meta's native BF16 model despite its quantized origin.

Ideal Use Cases

Applications requiring transparent, step-by-step reasoning.
Educational tools where understanding the thought process is as important as the final answer.
Tasks benefiting from detailed problem analysis and logical breakdown.
Environments where the structured output (with <think> tags) can be leveraged for enhanced user interaction.

Overview

Model Overview

Key Capabilities & Features

Performance Highlights

Ideal Use Cases

Full Model Card (README)