Phonsiri/gemma-2-2b-SFT-Reasoning-full-Model

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:Mar 2, 2026License:gemmaArchitecture:Transformer Warm

Phonsiri/gemma-2-2b-SFT-Reasoning-full-Model is a supervised fine-tuned version of Google's Gemma-2-2b, specifically trained to generate structured chain-of-thought reasoning for mathematical and logical problems. This model explicitly separates its step-by-step reasoning from the final answer using dedicated XML-like tags. It is designed to excel in producing clear, structured thought processes, making it suitable for applications requiring transparent problem-solving steps. The model was fine-tuned using full parameter training on a combination of reasoning and mathematical datasets.

Loading preview...

Model Overview

This model, developed by Phonsiri Thabunsri and CYP777, is a supervised fine-tuned (SFT) version of Google's gemma-2-2b-it base model. Its primary purpose is to generate structured chain-of-thought reasoning for mathematical and logical problems, explicitly separating the reasoning process from the final answer.

Key Capabilities

  • Structured Reasoning: Produces detailed, step-by-step reasoning enclosed within <reasoning>...</reasoning> tags, followed by the concise final answer in <answer>...</answer> tags.
  • Foundation for Downstream Models: Serves as the base for further training, such as the Phonsiri/gemma-2-2b-GRPO-Reasoning-full model.
  • Full Parameter Fine-tuning: Unlike many models using LoRA/PEFT, this model underwent full parameter fine-tuning for comprehensive adaptation.

Training Details

The model was trained for 3 epochs with a learning rate of 2e-5 and a maximum sequence length of 8192. Training data included the nohurry/Opus-4.6-Reasoning-3000x-filtered dataset, alongside several local datasets containing Thai and general mathematical problems and solutions.

Good for

  • Applications requiring transparent and verifiable problem-solving steps.
  • Educational tools that need to show how a solution is derived.
  • Developing agents that can explain their decision-making process in a structured format.
  • As a base model for further reinforcement learning or fine-tuning on reasoning tasks.