KeeganCarey/gemma-3-1b-it-amr_thinking-2
KeeganCarey/gemma-3-1b-it-amr_thinking-2 is a 1 billion parameter Gemma-based instruction-tuned language model, fine-tuned using Group Relative Policy Optimization (GRPO) to generate structured reasoning traces. This model is specifically designed to output a step-by-step thinking process followed by a final answer, making it suitable for tasks requiring explicit reasoning. With a 32k token context length, it excels at complex problem-solving where intermediate thought processes are crucial.
Loading preview...
Overview
This model, KeeganCarey/gemma-3-1b-it-amr_thinking-2, is a 1 billion parameter instruction-tuned variant of the Gemma architecture, specifically optimized for generating structured reasoning. It was developed by KeeganCarey and fine-tuned using a novel method called Group Relative Policy Optimization (GRPO) on top of a Supervised Fine-Tuning (SFT) base model.
Key Capabilities
- Structured Reasoning Output: The model is engineered to produce a distinct
<reasoning>block detailing its thought process, followed by an<answer>block for the final result. This explicit output format is highly beneficial for transparency and debugging in AI applications. - Enhanced Problem Solving: By focusing on generating intermediate reasoning steps, the model aims to improve performance on tasks that require complex logical deduction or multi-step problem-solving.
- 32k Context Length: Built upon a base model with a 32,768 token context window, it can process and reason over significantly longer inputs compared to many other models in its size class.
Training Details
The model's training involved a combination of Supervised Fine-Tuning (SFT) and GRPO, utilizing a LoRA configuration with a rank of 32 and an alpha of 64.0. The training was conducted using the Tunix (JAX) framework on a v6e-1 TPU.
Good for
- Applications requiring explainable AI outputs.
- Tasks where the step-by-step thought process is as important as the final answer.
- Educational tools that demonstrate problem-solving methodologies.
- Automated systems needing to justify their decisions or conclusions.