nassimjp/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled
The nassimjp/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled model is a 7.9 billion parameter language model fine-tuned from Google's Gemma 4 E4B. It specializes in structured, deliberate reasoning by distilling Chain-of-Thought samples from Claude 4.6 Opus. This model excels at complex problem-solving, including multi-step math, logic, and code decomposition, by planning responses within tags. It is optimized for tasks requiring explicit reasoning processes rather than general-purpose conversational abilities.
Loading preview...
Overview
nassimjp/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled is a 7.9 billion parameter model, fine-tuned from the google/gemma-4-E4B-it base model. Its core innovation lies in its training on approximately 2,300 high-quality Chain-of-Thought (CoT) samples distilled from Claude 4.6 Opus. This process aims to imbue the compact Gemma 4B model with a more structured and deliberate reasoning style, encouraging it to "think" before generating a final answer.
Key Capabilities
- Structured Reasoning: The model learns to plan its responses within
<think>tags, breaking down problems step-by-step. - Problem Solving: Enhanced ability to tackle multi-step math and logic problems.
- Code Analysis: Proficient in code problem decomposition and debugging.
- Deliberate Responses: Prioritizes showing reasoning over raw speed, leading to more thoughtful outputs.
Training Details
The model was trained using SFT + QLoRA (4-bit) with Unsloth, on the nohurry/Opus-4.6-Reasoning-3000x-filtered dataset. This dataset comprises Claude 4.6 Opus reasoning trajectories covering math, logic, and coding. Training involved 3 epochs with a maximum sequence length of 2048 and loss masking applied only to responses.
Limitations
- Text-only: Multimodal capabilities of the base model were not trained.
- Focused Scope: Due to the small, specialized dataset, it is a focused reasoning fine-tune, not a general-purpose upgrade.
- Hallucinations: Like all LLMs, it can still hallucinate, particularly on factual recall outside its training domain.
Good For
- Tasks requiring explicit, step-by-step reasoning.
- Applications where understanding the thought process is as important as the final answer.
- Complex analytical tasks in math, logic, and code.