ConicCat/Gemma-3-Fornax-V4-27B-QAT

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kLicense:gemmaArchitecture:Transformer0.0K Cold

ConicCat/Gemma-3-Fornax-V4-27B-QAT is a 27 billion parameter model based on the Gemma 3 architecture, specifically distilled from Deepseek R1 05/28. This model focuses on timely and generalizable reasoning across a wide variety of tasks, moving beyond the typical coding and mathematical reasoning optimizations. It achieves this by fine-tuning on diverse, high-quality reasoning traces, aiming to prevent length overfitting and improve generalization.

Loading preview...

Overview

ConicCat/Gemma-3-Fornax-V4-27B-QAT, or Gemma Fornax, is a 27 billion parameter model built upon the Gemma 3 architecture. It is a distillation of the updated Deepseek R1 05/28, with a primary focus on enhancing generalizable reasoning capabilities beyond specialized domains like coding and mathematics. Unlike many open-source models that often over-specialize in coding and math due to methods like GRPO for Chain-of-Thought (CoT), Gemma Fornax aims for broader applicability.

Key Capabilities & Differentiators

  • Generalizable Reasoning: Designed to generalize reasoning effectively across a wide array of tasks, moving past the limitations of models overly focused on coding and math.
  • Diverse Reasoning Traces: Utilizes a supervised fine-tuning (SFT) approach with a wide variety of high-quality, diverse reasoning traces from Deepseek R1 05/28.
  • Prevents Length Overfitting: Incorporates varying CoT length and explicit noise regularization during training to prevent the characteristic "waffling" or fixed-length reasoning often seen in GRPO-trained models.
  • Gemma 3 Base: Leverages the robust foundation of the Gemma 3 27B model line.

Recommended Settings

For optimal performance, the recommended inference settings are a temperature of 0.7 and Nsigma 1.