Ebumping/Qwen3-32B-Fable-Distill

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 15, 2026Architecture:Transformer0.0K Cold

Ebumping/Qwen3-32B-Fable-Distill is a 32 billion parameter Qwen3 model fine-tuned using Supervised Fine-Tuning (SFT) on curated reasoning traces. This model specializes in preserving distinct reasoning steps, marked by blocks, which were distilled from advanced frontier models. It is optimized for tasks requiring explicit reasoning processes, with training loss computed only on assistant tokens. The model is suitable for applications where structured thought processes are beneficial.

Loading preview...

Model Overview

Ebumping/Qwen3-32B-Fable-Distill is a 32 billion parameter Qwen3 model, version 0.2, developed by Ebumping. It has been fine-tuned using Supervised Fine-Tuning (SFT) with the TRL framework, specifically on a dataset of 4,207 examples containing reasoning traces distilled from frontier models.

Key Differentiators (v0.2)

  • Preserved Reasoning Traces: Unlike its predecessor (v0.1), this version maintains distinct <think> blocks for reasoning, preventing reasoning steps from being flattened into the final generation.
  • Assistant-Only Loss: Training loss is calculated exclusively on assistant tokens, which can lead to more focused and efficient learning for response generation.
  • Curated Training Data: The model was trained on a refined dataset where CoT-less examples were removed, and Claude channel data was converted to the Qwen3 <think> format.

Training Details

The model was trained for 789 steps on a unsloth/qwen3-32b-bnb-4bit base model, utilizing LoRA with a rank of 64. The merged weights are in BF16 precision. It supports a context length of 32768 tokens.

VRAM Requirements

Users should note the significant VRAM requirements, with the BF16 merged format needing 80 GB+ and quantized GGUF formats ranging from 20 GB (Q4_K_M) to 40 GB+ (Q8_0).

Use Cases

This model is particularly well-suited for applications requiring explicit, step-by-step reasoning, where the preservation of thought processes is crucial for understanding or debugging model outputs.