Phonsiri/gemma-2-2b-SFT-Reasoning-full-Model
Phonsiri/gemma-2-2b-SFT-Reasoning-full-Model is a supervised fine-tuned version of Google's Gemma-2-2b, specifically trained to generate structured chain-of-thought reasoning for mathematical and logical problems. This model explicitly separates its step-by-step reasoning from the final answer using dedicated XML-like tags. It is designed to excel in producing clear, structured thought processes, making it suitable for applications requiring transparent problem-solving steps. The model was fine-tuned using full parameter training on a combination of reasoning and mathematical datasets.
Loading preview...
Model Overview
This model, developed by Phonsiri Thabunsri and CYP777, is a supervised fine-tuned (SFT) version of Google's gemma-2-2b-it base model. Its primary purpose is to generate structured chain-of-thought reasoning for mathematical and logical problems, explicitly separating the reasoning process from the final answer.
Key Capabilities
- Structured Reasoning: Produces detailed, step-by-step reasoning enclosed within
<reasoning>...</reasoning>tags, followed by the concise final answer in<answer>...</answer>tags. - Foundation for Downstream Models: Serves as the base for further training, such as the
Phonsiri/gemma-2-2b-GRPO-Reasoning-fullmodel. - Full Parameter Fine-tuning: Unlike many models using LoRA/PEFT, this model underwent full parameter fine-tuning for comprehensive adaptation.
Training Details
The model was trained for 3 epochs with a learning rate of 2e-5 and a maximum sequence length of 8192. Training data included the nohurry/Opus-4.6-Reasoning-3000x-filtered dataset, alongside several local datasets containing Thai and general mathematical problems and solutions.
Good for
- Applications requiring transparent and verifiable problem-solving steps.
- Educational tools that need to show how a solution is derived.
- Developing agents that can explain their decision-making process in a structured format.
- As a base model for further reinforcement learning or fine-tuning on reasoning tasks.