Name: Viesar/gemma-3-4b-opus-reasoning-distill API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Viesar

Viesar/gemma-3-4b-opus-reasoning-distill Overview

This model is a QLoRA fine-tune of the google/gemma-3-4b-it base, developed by Viesar. It features 4.3 billion parameters and a 32768 token context length. The fine-tuning process utilized the Crownelius/Opus-4.6-Reasoning-3300x dataset, specifically filtered to approximately 1,900 examples, with a strong focus on mathematical reasoning (94% math content).

Key Characteristics & Performance

Base Model: google/gemma-3-4b-it
Fine-tuning Method: QLoRA (r=16, alpha=16)
Training Data: Subset of Opus-4.6-Reasoning-3300x, emphasizing math problems.
Transparent Benchmarking: The model's performance on MATH-500 and GSM8K benchmarks shows a decrease compared to the base Gemma 3 4B model. For instance, on MATH-500, it achieved 24.6% exact match compared to the base's 29.6%, and on GSM8K, 53.7% exact match versus 68.7%.
Reasoning Style: This performance drop is attributed to a format mismatch between the benchmarks (few-shot completion prompts) and the model's trained chat-template style (<think>...</think>), as well as potential capability narrowing from specialized training. Qualitatively, the model retains its intended chat-style reasoning capabilities.
Inference Cost: Identical to the base model, with peak VRAM around 6.2 GB on an RTX 3050 8GB for 4-bit inference.

Use Cases

Chat-style Reasoning Tasks: Best suited for interactive reasoning challenges where the model can utilize its trained thought process and chat template.
Further Fine-tuning: The provided transformers format is suitable for additional fine-tuning or evaluation.
Local Inference: A GGUF version is available for efficient local inference, including multimodal readiness.

Overview

Viesar/gemma-3-4b-opus-reasoning-distill Overview

Key Characteristics & Performance

Use Cases

Full Model Card (README)