Overview
KeeganC/gemma-3-1b-it-amr_thinking is a 1 billion parameter instruction-tuned model built upon the Gemma architecture, specifically designed for generating structured reasoning. Developed by KeeganC, this model leverages Group Relative Policy Optimization (GRPO) during its training, a method that distinguishes it from standard fine-tuning approaches.
Key Capabilities
- Structured Reasoning Output: The model is trained to produce output in a specific format, including a
<reasoning> tag for step-by-step thought processes and an <answer> tag for the final solution. This makes its decision-making process transparent. - GRPO Training: Utilizes GRPO, building on a base model (
chimbiwide/gemma-3-1b-it-thinking-32k-sft-base) that was initially fine-tuned (SFT). This advanced training method aims to enhance its ability to generate coherent and logical reasoning traces. - Extended Context Length: Features a 32,768 token context window, allowing it to process and reason over longer and more complex inputs.
Use Cases
This model is particularly well-suited for applications where not just the answer, but also the methodology and thought process leading to that answer, are crucial. This includes tasks such as:
- Problem-solving requiring explicit logical steps.
- Educational tools that demonstrate how to arrive at solutions.
- Automated reasoning systems where transparency is key.
Training Details
The model was trained using the Tunix (JAX) framework on a v6e-1 TPU, with LoRA rank 32 and LoRA alpha 64.0, indicating a focused and efficient fine-tuning process.