nassimjp/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

VISIONConcurrency Cost:1Model Size:7.9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 17, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The nassimjp/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled model is a 7.9 billion parameter language model fine-tuned from Google's Gemma 4 E4B. It specializes in structured, deliberate reasoning by distilling Chain-of-Thought samples from Claude 4.6 Opus. This model excels at complex problem-solving, including multi-step math, logic, and code decomposition, by planning responses within tags. It is optimized for tasks requiring explicit reasoning processes rather than general-purpose conversational abilities.

Loading preview...

Overview

nassimjp/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled is a 7.9 billion parameter model, fine-tuned from the google/gemma-4-E4B-it base model. Its core innovation lies in its training on approximately 2,300 high-quality Chain-of-Thought (CoT) samples distilled from Claude 4.6 Opus. This process aims to imbue the compact Gemma 4B model with a more structured and deliberate reasoning style, encouraging it to "think" before generating a final answer.

Key Capabilities

  • Structured Reasoning: The model learns to plan its responses within <think> tags, breaking down problems step-by-step.
  • Problem Solving: Enhanced ability to tackle multi-step math and logic problems.
  • Code Analysis: Proficient in code problem decomposition and debugging.
  • Deliberate Responses: Prioritizes showing reasoning over raw speed, leading to more thoughtful outputs.

Training Details

The model was trained using SFT + QLoRA (4-bit) with Unsloth, on the nohurry/Opus-4.6-Reasoning-3000x-filtered dataset. This dataset comprises Claude 4.6 Opus reasoning trajectories covering math, logic, and coding. Training involved 3 epochs with a maximum sequence length of 2048 and loss masking applied only to responses.

Limitations

  • Text-only: Multimodal capabilities of the base model were not trained.
  • Focused Scope: Due to the small, specialized dataset, it is a focused reasoning fine-tune, not a general-purpose upgrade.
  • Hallucinations: Like all LLMs, it can still hallucinate, particularly on factual recall outside its training domain.

Good For

  • Tasks requiring explicit, step-by-step reasoning.
  • Applications where understanding the thought process is as important as the final answer.
  • Complex analytical tasks in math, logic, and code.