Viesar/gemma-3-4b-opus-reasoning-distill
Viesar/gemma-3-4b-opus-reasoning-distill is a 4.3 billion parameter Gemma 3-4B instruction-tuned model, fine-tuned by Viesar using QLoRA. This model is specifically distilled for reasoning tasks, leveraging the Opus-4.6-Reasoning dataset. It is designed for chat-style reasoning applications, despite showing reduced performance on traditional completion-style math benchmarks compared to its base model.
Loading preview...
Viesar/gemma-3-4b-opus-reasoning-distill Overview
This model is a QLoRA fine-tune of the google/gemma-3-4b-it base, developed by Viesar. It features 4.3 billion parameters and a 32768 token context length. The fine-tuning process utilized the Crownelius/Opus-4.6-Reasoning-3300x dataset, specifically filtered to approximately 1,900 examples, with a strong focus on mathematical reasoning (94% math content).
Key Characteristics & Performance
- Base Model:
google/gemma-3-4b-it - Fine-tuning Method: QLoRA (r=16, alpha=16)
- Training Data: Subset of Opus-4.6-Reasoning-3300x, emphasizing math problems.
- Transparent Benchmarking: The model's performance on
MATH-500andGSM8Kbenchmarks shows a decrease compared to the base Gemma 3 4B model. For instance, onMATH-500, it achieved 24.6% exact match compared to the base's 29.6%, and onGSM8K, 53.7% exact match versus 68.7%. - Reasoning Style: This performance drop is attributed to a format mismatch between the benchmarks (few-shot completion prompts) and the model's trained chat-template style (
<think>...</think>), as well as potential capability narrowing from specialized training. Qualitatively, the model retains its intended chat-style reasoning capabilities. - Inference Cost: Identical to the base model, with peak VRAM around 6.2 GB on an RTX 3050 8GB for 4-bit inference.
Use Cases
- Chat-style Reasoning Tasks: Best suited for interactive reasoning challenges where the model can utilize its trained thought process and chat template.
- Further Fine-tuning: The provided transformers format is suitable for additional fine-tuning or evaluation.
- Local Inference: A GGUF version is available for efficient local inference, including multimodal readiness.