KeeganCarey/gemma-3-1b-it-amr_thinking
KeeganCarey/gemma-3-1b-it-amr_thinking is a 1 billion parameter Gemma-based instruction-tuned model, developed by KeeganCarey, specifically trained using Group Relative Policy Optimization (GRPO) to generate structured reasoning traces. With a 32768 token context length, this model is optimized for tasks requiring explicit step-by-step thinking processes before providing a final answer. It is designed to output both the reasoning and the answer in a distinct, structured format.
Loading preview...
Model Overview
KeeganCarey/gemma-3-1b-it-amr_thinking is a 1 billion parameter model built upon the Gemma architecture, specifically fine-tuned for generating structured reasoning. This model leverages Group Relative Policy Optimization (GRPO) to enhance its ability to produce explicit, step-by-step thought processes alongside its final answers.
Key Capabilities
- Structured Reasoning Output: Generates output in a distinct
<reasoning>step-by-step thinking process</reasoning><answer>final answer</answer>format. - GRPO Training: Utilizes Group Relative Policy Optimization for improved reasoning trace generation.
- Extended Context Window: Features a 32k token context length, allowing for processing longer inputs and more complex reasoning tasks.
Training Details
This model was trained using a combination of Supervised Fine-Tuning (SFT) and GRPO. The base model for this training was chimbiwide/gemma-3-1b-it-thinking-32k-sft-base. Training was conducted using the Tunix (JAX) framework on a v6e-1 TPU, with LoRA rank 32 and LoRA alpha 64.0.
Ideal Use Cases
This model is particularly well-suited for applications where not just the answer, but also the explicit thought process leading to that answer, is crucial. This includes tasks like:
- Problem-solving requiring transparent steps.
- Educational tools that explain solutions.
- Debugging or diagnostic systems that outline reasoning.